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Preface 



Since the publication of Alfred Haar's work on orthogonal function systems a 
hundred years ago, the world has witnessed a tremendous growth in the theory 
and practice of wavelet, even though a reported, systematic study of the subject field 
and its applications to engineering did not occur until the 1980s. Over the last 3 
decades, a plethora of literature has been published, describing advancement in the 
wavelet theory and its successful applications in various fields of engineering: from 
image processing in biomedical engineering to signal processing in meteorology to 
bridge monitoring in civil engineering. The adaptive, multiresolution capability of 
the wavelet transform has also made it a powerful mathematical tool for the 
diagnostics of equipment operation conditions in manufacturing, such as tool 
breakage. 

Past research on wavelets has been translated into a large volume of publications 
and significantly impacted the state-of-the-technology. These papers, together with 
a series of classic books, have taught generations of engineers the theory and 
applications of wavelets. Nevertheless, there exists a gap in the literature that is 
particularly dedicated to graduate students and practicing engineers in 
manufacturing who are interested in learning about and applying the theory of 
wavelet transform to solving problems related to equipment and process monitor- 
ing, diagnostics, and prognostics in manufacturing. 

The book is intended to bridge such a gap by presenting a systematic yet easily 
accessible treatment of the mathematics of wavelet transform and illustrate, in 
concrete terms, how wavelet transform as a mathematical tool can be realized for 
applications in manufacturing. Contributing to the understating and adoption of 
wavelets by the manufacturing community is the primary motivation for this book, 
and the 12 chapters included herein provide an overview of some of the latest 
efforts in this vibrant field. 

To establish a common ground for the treatment of signals, which is the focal 
point of this book. Chap. 1 starts by introducing a general classification scheme of 
signals typically countered in mechanical systems, from the point of view of their 
statistical behavior - deterministic and nondeterministic signals. Using a mass- 
spring-damper system as a physical embodiment, the analytical expression, wave- 
form, and solution of deterministic signals are first illustrated. These are then 
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contrasted against the nondeterministic family of signals, and the concept of 
nonstationary, which provides the fundamental motivation for dedicating this 
book to wavelet transform in manufacturing, is introduced. Taking signals 
measured in two representative manufacturing processes as a realistic example, 
the link between manufacturing and signal processing, as well as the need for 
properly treating nonstationary signals, are established, motivating the dedication 
of the book to this subject matter. 

Chapter 2 reviews several major events occurred in the field of signal processing 
since the invention of the Fourier transform in the nineteenth century, thereby 
recognizing the historical significance of spectral analysis. Such events have 
initiated and accompanied the conceptualization, formulation, and growth of the 
theory of wavelet transform. Based on the concept that signal transformation (for 
revealing the information content of the signal) can generally be represented by a 
convolution operation between the signal and a known template function, we 
sought to illustrate the common ground shared by the Fourier transform as well 
as its enhanced version (the short-time Fouriertransform), which has a fixed length 
of the analysis window, and the wavelet transform, which features an analysis 
window of variable length. 

The next three chapters. Chaps. 3-5, are devoted to introducing the fundamental 
mathematics involved in understanding what wavelet transform is and does, and 
how to apply it to decompose nonstationary signals as typically encountered in 
manufacturing. Aware of the existence of many excellent books on wavelets and at 
the same time, the recognized need by many graduate students and practicing 
engineers for a step-by-step treatment of some of the mathematical procedures 
involved to implement the wavelet transform, in terminologies familiar to engi- 
neers, we tried to take a balanced approach when writing these chapters. Specifi- 
cally, we introduced the continuous version of the wavelet transform in Chap. 3, by 
first drawing the resemblance between a continuous, sinusoidal wave and a time- 
localized wavelet that is essentially a linear, integral transformation satisfying the 
admissihility condition. To provide the readers with a handy access to some of the 
most often encountered properties of the continuous wavelet transform (CWT) in 
one place, we included descriptions of concepts such as superposition, covariance 
under translation and dilation, and the Mayol principle, together with a mathemati- 
cal proof, for each of these properties. By providing detailed proofs, we wish to 
encourage readers who might have initially felt intimidated by the wavelet mathe- 
matics to gain some confidence in approaching the topic from a practical yet 
mathematically rigorous perspective, instead of resorting to a strictly recipe type 
of operations. We then proceeded to give a step-for-step procedure for implement- 
ing the CWT, in two ways, such that readers can see, in concrete terms, where all 
the background information finally leads to, in terms of performing CWT on some 
representative signals. 

Chapter 4 introduces the discrete version of the wavelet transform, or DWT. The 
chapter is motivated by the recognition that CWT, while enabling a 2D decompo- 
sition of signals in the time-frequency (via the scale) domain with high resolution, 
is computationally complex due to the generation of redundant data. In comparison, 
the DWT is computationally more efficient, thus it is better suited for image 
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compression and real-time applications. Using logarithmic discretization as an 
example, we first discussed how parameters are discretized to guarantee correct 
information retrieval as a result of the DWT process. Several derivation details are 
provided to illustrate the thought process. We then moved to the dyadic discretiza- 
tion method that allows for orthogonal wavelet basis to be constructed, based on the 
theory of multiresolution analysis (MRA). To satisfy readers who may be interested 
in knowing a bit more about the "why" and "how" related to MRA, we supple- 
mented the explanation with several mathematical details, and illustrated why the 
process of DWT will lead to the generation of detailed and approximate informa- 
tion. In this context, we demonstrated that DWT, in essence, is about performing a 
series of low-pass and high-pass filtering operations, which can be implemented by 
following Mallat's algorithm. Mirroring the structure of Chap. 3, we presented 
several commonly used wavelets for DWT and illustrated how they can be used for 
applications such as signal denoising, by means of soft and hard thresholding. 

While enabling flexible time-frequency resolution in signal decomposition, the 
relatively low resolution of DWT in analyzing the high-frequency region gives rise 
to the wavelet packet transform (WPT), which is the focus of Chap. 5. After a brief 
coverage of its definition and basic properties, two algorithms for implementing the 
WPT - the recursive algorithm developed by Mallat and a Fourier transform-based 
algorithm that leads to the harmonic wavelet packet transform - are introduced. We 
then illustrated how a signal's time-frequency composition of a vibration signal, 
which relates directly to the working state of manufacturing equipment, can be 
revealed by the WPT, and how WPT can be applied to removing Gaussian noise 
from a chirp signal. These applications exemplify how the enhanced resolution of 
WPT can provide an attractive tool for detecting and differentiating transient 
elements with high-frequency characteristics. 

With the fundamentals of wavelet transform covered. Chaps. 6-8 describe 
several application scenarios where the effectiveness of wavelet transforms are 
demonstrated. The first application relates to signal enveloping, a technique com- 
monly used for nondestructive testing and structural defect identification. Addres- 
sing the limitation of enveloping in requiring a priori knowledge for choosing the 
filtering band to extract a signal's envelope, an adaptive, multiscale enveloping 
method (MuSEnS) that is rooted in the wavelet transform is described in Chap. 6, 
which effectively overcomes the limitation. Taking advantage of the Hilbert trans- 
form in extracting the envelop of an analytic signal and the fact that performing 
wavelet transform on a signal using a complex-valued base wavelet will result in an 
analytic signal, the chapter illustrates how a signal's envelope can be readily 
calculated from the modulus of the corresponding wavelet coefficients. To illustrate 
the effectiveness of this technique in signal decomposition, two manufacturing- 
related applications - differentiation of ultrasonic pulses that are timely overlapped 
and spectrally adjacent for wireless pressure measurement in injection molding and 
bearing defect diagnosis in rotary machines - are demonstrated, using both experi- 
mentally measured signals and synthetic signals for quantitative evaluation. 

While the localized signal decomposition capability of wavelet transform is 
particularly useful for transient events identification, the result of wavelet transform 
does not explicitly reveal distinct characteristic frequencies that are often times 
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indicative of defective modes of a machine, e.g., the ball passing frequency at the 
inner raceway of a rolling element, when a localized spalling is present. In such 
situations, the effectiveness of wavelet transform can be leveraged by the Fourier 
transform in identifying a signal's frequency components. This leads to the formu- 
lation of a unified time-scale-frequency analysis technique that adds spectral post- 
processing to the data set extracted by wavelet transform, for enhanced defect 
diagnosis. In Chap. 7, we demonstrate such an integrated method, in the context 
of a generalized signal transformation frame. An expression for both the Fourier 
transform and wavelet transform in the generalized frame is first presented, estab- 
lishing the basis for crossdomain unification of the two transforms. Next, the 
viability of postspectral analysis of wavelet processed data is analytically justified, 
and the effectiveness of the technique in identifying the bearing defects under 
various operating conditions is demonstrated. 

A question that naturally arises upon defect detection is the severity of the 
defect, which affects the proper scheduling of maintenance. To answer this ques- 
tion, we demonstrate in Chap. 8 how WPT can be applied in classifying machine 
defect severity, using vibration signals from rolling bearings as an example. We 
start the discussion by associating /ea/Mre.s (e.g., energy content or Kurtosis value) 
of a signal with the subfrequency bands of its decomposition, enabled by the WPT, 
and demonstrate how WPT can flexibly extract features from the subfrequency 
bands of the decomposed signal where the features are concentrated. The chapter 
further contains a discussion on how to process the features, once they are obtained, 
for classification purpose. Relevant techniques for selecting best-suited features 
using the Fisher linear discriminant analysis and principal component analysis, and 
classifying features to quantify defect severity levels are described. Two case 
studies presented toward the end of the chapter on ball and roller bearings confirm 
the validity of WPT for defect severity classification. 

Chapter 9 continues the discussion on signal classification, with a focus on how 
it can be applied to differentiate different working conditions of a machine, for the 
purpose of diagnosis. The concept of discriminant features is first introduced, and a 
technique called the local discriminant bases (LDB) is described in detail. In a 
nutshell, the LDB algorithm determines an optimal set of wavelet packet nodes, 
each of which corresponding to a wavelet packet basis, to represent signals acquired 
under different machine states as different classes. Similar to the Shannon entropy 
feature introduced in Chap. 5 for signal compression, several features suited for 
diagnosis of rotating machines, e.g., relative entropy or correlation index, are 
identified in this chapter. We provided a step-by-step description of the LDB 
algorithm, for readers to see how the algorithm can be implemented. Using three 
synthetic signals with added white noise and vibration signals measured on a 
gearbox under different states of wear, we quantitatively demonstrated how the 
wavelet packet bases constructed using the LDB algorithm can more successfully 
differentiate and classify these signals than without using the LDB. 

Given the abundance of the base wavelets in the published literature, it is natural 
to ask the question as to how to choose an optimal base wavelet for analyzing a 
particular type of signal. This is based on the understanding that (1) the choice of 
base wavelet made in the first place will affect the result obtained at the end, and 
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(2) each base wavelet may be developed for different purposes and emphasis; 
therefore, an educated approach to their selection is needed when solving a specific 
type of engineering problems. In this book, we have tried to address this issue of 
significant intellectual interest in two ways. First, in Chap. 10, we introduced a 
general strategy for base wavelet selection, using both qualitative measures (e.g., 
orthogonality and compact support) and quantitative measures (e.g.. Shannon 
entropy and discrimination power). Subsequently, we presented several criteria 
for base wavelet selection, including the energy-to-Shannon entropy ratio and the 
maximum information measure. Using both real-valued and complex-valued base 
wavelets, we demonstrated how these criteria can be applied to selecting the best- 
suited base wavelet from a pool of candidates to decompose both a numerically 
formulated Gaussian-modulated sinusoidal test signal and a vibration signal 
measured on a defective ball bearing, thus confirming the effectiveness of these 
criteria. 

Besides investigating how to choose an appropriate base wavelet from the 
existing library, another approach is to design a customized wavelet that is adapted 
to a specific type of application to maximize the degree of matching with the signal 
of interest, thus improving the effectiveness of feature extraction. Such a comple- 
mentary technique is the focus of Chap. 1 1 . After reviewing the fundamental issues 
involved in the wavelet design process and several customized wavelets, we 
described in detail the process of designing an impulse wavelet for bearing vibration 
analysis, based on the impulse response of the mechanical structure where the 
bearing is housed. The importance of satisfying the dilation equation to avoid 
information loss in the signal reconstruction is stressed, and the procedure of 
meeting this requirement is illustrated. Using the designed impulse wavelet, vibra- 
tion signals from a defective bearing are analyzed, and the result is compared with 
that from using five standard wavelets available in the library, using the signal-to- 
noise ratio for the defect-characteristic frequency as the measure for comparison. 
The good performance of the impulse wavelet confirms the validity of the analytical 
procedure described in developing customized wavelets for enhanced signal analy- 
sis in a broad range of applications in engineering. 

The last chapter of the book provides a brief survey of some new advancement 
reported in recent years that goes beyond the classical wavelet transform. These 
latest developments address some of the fundamental limitations inherent to the 
wavelet transform, e.g., when it is used to analyze signals of finite length and/or 
limited duration, or for capturing and defining image boundaries. We started the 
survey by introducing the second generation wavelet transform, or SGWT, which 
uses the so-called lifting scheme to replace the traditional mechanism of wavelet 
construction that uses translation and dilation. Major operation steps for realizing 
the SGWT, such as splitting, prediction, and updating, are described, and the 
effectiveness of the technique for separation and reconstruction of an intermittent 
linear chirp signal is demonstrated. Addressing the inherent limitations of classical 
wavelets (e.g., isotropic) and the specific challenges in image processing (e.g., in 
resolving image boundaries), we then introduced the ridgelet and curvelet trans- 
forms. The former was developed to address the need for analyzing anisotropic 
features in images, whereas the latter enables improved representation of curved 
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boundaries in images. For each transform, we have presented the definition and 
basic properties, and demonstrated a representative application in manufacturing. 

As is true with any book, the writing reflects upon the authors' understanding of 
and knowledge about the subject matter. While we have strived to present to the 
readers a composition that is both rigorous in the mathematical treatment and 
relevant in the examples chosen to complement the theory, it is inherently difficult 
for a work of this size to be completely free of errors. We bear the responsibility for 
anything that is not correctly stated in the book and would greatly appreciate 
hearing from our readers about any mistakes they found such that we can correct 
them in future printings. 

We thank the anonymous reviewers for their insightful and constructive com- 
ments that have both sharpened our thinking and provided clues to improving the 
pedagogic presentation. We are indebted to former graduate students at the Elec- 
tromechanical Systems (EMS) Laboratory, whose intellectual contributions have 
made the book a reality. In particular, we thank Drs. Brian Holm-Hansen, Changt- 
ing Wang, and Li Zhang, who dedicated a substantial part of their doctoral research 
to the study of wavelet for diagnosis in manufacturing equipment and processes, 
and Dr. Qingbo He, who spent a year as a postdoctoral research fellow at the EMS 
Laboratory, working on the characterization of physical activities, for their dedica- 
tion to and enthusiasm in exploring the world of wavelets, which has laid the 
foundation for this book. We also thank the US National Science Foundation for 
funding a number of relevant projects, which has allowed us to systematically study 
this fascinating subject. 

This book-writing project was initially planned to be completed in 1 year, but a 
number of events that took place in between have delayed the writing and made the 
time needed for ultimately completing the book nearly twice as long. We take this 
opportunity to express our sincere appreciation to the publisher for supporting this 
project. In particular, we thank Mr. Stephen Elliot, Senior Editor for Engineering, 
and Mr. Andrew Leigh, editorial assistant, for their earnest cooperation, editorial 
assistance, and above all, patience, which has created a relaxed environment for us 
to finish the writing while juggling many other deadline issues. Last but not the 
least, we sincerely thank our respective families for their understanding and carry- 
ing the load for us during the course of this project so that we could devote as much 
time as possible to the writing of the book. It is our hope that the book is worth their 
selfless support, and our readers will find in it something that is of value to their 
research. 
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Chapter 1 

Signals and Signal Processing in Manufacturing 



The term "signal" refers to a physical quantity that carries certain type of 
information and serves as a means for communication. As an example, the output 
of an accelerometer in the form of a voltage that varies with time is a signal that 
carries information about the vibration of the structure (e.g., a machine tool) on 
which the accelerometer is installed. Such a signal can serve as a means for 
communicating the operation status of the machine tool to the machine operator. 



1.1 Classification of Signals 

In general, any signal can be broadly classified as being either deterministic or 
nondeterministic (Bendat and Piersol 2000). Deterministic signals are those that can 
be defined explicitly by mathematical functions. An example is the vibration caused 
by imbalance in a rolling bearing, when the bearing's gravitational center does not 
coincide with the rotational center. Nondeterministic signals, in comparison, are 
random in nature and are described in statistical terms. An example is the acoustic 
emission signals generated during a machining process. In real-world applications, 
whether a measured signal is deterministic or nondeterministic depends on its 
reproducibility. A signal that can be generated repeatedly with identical results is 
considered to be deterministic, otherwise it is nondeterministic. 



/././ Deterministic Signal 

There are two types of deterministic signals: periodic and transient. They are briefly 
explained and illustrated in the following. 

1.1.1.1 Periodic Signal 

A periodic signal is defined as a function that repeats itself exactly after a certain 
period of time, or cycle. Such a signal is mathematically expressed as 
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Fig. 1.1 A single-degree-of- 
freedom (SDOF) mass- 
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In the above equation, Z represents the integer set, n is an integer, and T > 
represents the period. The simplest example of a periodic signal is the sinusoidal 
signal. 

In practice, many physical systems can produce such a type of signal. A typical 
scenario is a single-degree-of-freedom (SDOF) mass-spring-damper system (Rao 
2003). As illustrated in Fig. 1.1, the mass m is attached to the wall through a spring 
k and a damper c, and can vibrate in the horizontal direction. The motion 
(or displacement) of the mass-spring-damping system under input F{t) is expressed as 



mx{t) + cx{t) + kx{t) = F{t) 



(1-2) 



where x{t) is the displacement of the mass, x{t) the velocity of the mass, and x{t) the 
acceleration of the mass. 

Let us suppose that the system is under free vibration, with the external forcing 
input F{t) being zero. Also assume that the damping coefficient c = 0. If the system 
is initially pulled away from the equilibrium position by a distance Aq and released 
with the initial velocity equal to zero, so that 



x{t^O) 



x{t = 0) = 



(1-3) 



then the solution of ( 1 .2) will generate a periodic signal with the period T = 2n/oj„. 
This will be a cosine function, as illustrated in Table 1.1a 

A complex periodic signal can also be generated from the same system (Fig. 1.1) 
with c — 0, when the system is subject to a harmonic forcing input, F{t) — F cos(cof) . 

As illustrated in Table 1.1b, the complete response can be expressed as the sum 
of cosine waveforms of two different frequencies. 



1.1.1.2 Transient Signal 



A transient signal is defined as a function that lasts a short period of time. Such a 
signal can be generated by the system shown in Fig. 1.1, with the damping 
coefficient c ^ and free vibration, as illustrated in Table 1.1c. 



1.1 Classification of Signals 

Table 1.1 Example of deterministic signals 



Mathematical function 



Waveform 



(a) A simple periodic signal 
Condition 
c = 

F{t) = 
Solution 

A'(/) = Ao cos(co„?) 



(b) A complex periodic signal 
Condition 
c = 

F{t) =Fcos{cot) 
Solution 

x{t) = A\ cos(a)„/) + A2 cos(cor) 



(c) A transient signal 
Condition 

F{t) = 
Solution 

x{t) = Aoe^'-"'"' cos{cOdt) 




x(t) 




x{t) 




(d) A mixed deterministic signal 
Condition 

F{t) =Fcos{wt) 
Solution 

x{t) = Aoe"^'""' cos(cOrf?) + A3 cos(a)f) 




s/ik-mm^-f- 



Periodic and transient signals are often mixed together in real-world applications. 
Such a signal can be generated, for example, by the system shown in Fig. 1 . 1 with the 
damping coefficient c ^ 0, under a harmonic force, as illustrated in Table l.ld. 



1.1.2 Nondeterministic Signal 



Nondeterministic signals, also called random signals, do not follow explicit mathe- 
matical expressions. They can be generally divided into two categories: stationary 
and nonstationary. 
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1.1.2.1 Stationary Signal 

A signal x{t) is considered stationary when none of its statistical properties change 
with time. Generally, wide-sense stationary (Bendat and Piersol 2000) is used to 
characterize the signal. This requires that it satisfies the following conditions on its 
mean function: 

E{x{ti)} = m,{ti) = m,{ti + t) tgZ (1.4) 

and the autocorrelation function: 

E{x{ti),x{ti + t)} = /?,,-(?i, ri + t) ^ 7?,,(0, t) t G R (1.5) 

In the above equations, the symbol t is the real number, R is defined as the real 
number set, and R^^ is the autocorrelation function of the signal x{t) . Equation ( 1 .4) 
indicates that the mean function mv(r) must be time-invariant or remain unchanged 
as time goes by. As shown in (1.5), the autocorrelation function of the signal 
depends only on the time difference t. The mean function and autocorrelation 
function of a signal can be obtained by time-averaging over a short time interval 
T as follows: 

EUih)}^^J' -<t)dt (1.6) 



and 



1 /•'■+^ 
E{x{tx),x{tx +%)] = - I x{t)x{t+x)dt (1.7) 



'1 



Table 1 .2a illustrates an example of a stationary signal, which satisfies the two 
conditions expressed in (1.5) and (1.6). 



1.1.2.2 Nonstationary Signal 

A signal whose statistical properties change with time is called a nonstationary 
signal. As a result, a nonstationary signal does not satisfy the conditions specified 
in (1.4) and (1.5). Table 1.2b illustrates a nonstationary signal. 

It should be noted that signal classification method as described above is not 
rigid and exclusive. No signals encountered in the real-world are exactly determin- 
istic. Furthermore, there exist other means to classify a signal. For example, a signal 
can be considered as being either linear or nonlinear, as defined by the superposition 
principle. An SDOF mass-spring system is considered linear, if a linear relationship 
exists between the force input to the system and its corresponding displacement 
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Table 1.2 Example of nondeterministic signals 



(a) Stationary signal 



(b) Nonstationary signal 




T, = T,= T, 




T, * T, * T, 



output. In real-world applications, a signal may contain some or several of the 
components described above. 
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Signals are ubiquitously present in manufacturing machines and systems. For 
example, metal removal is essential to many manufacturing processes, as seen in 
turning, milling, and drilling (Schey 1999). During such a process, interactions 
between the cutting edge of the tool and the workpiece lead to removal of fragments 
of varying volumes, producing whereby time-varying or transient components in 
the vibration signals. Figure 1.2 illustrates the waveform of a vibration signal 
measured on a CNC milling machine center (shown in Fig. 1.3) when it is in 
production. 

Another manufacturing process where transient signals may present is sheet 
metal stamping. The physical setup of a sheet metal stamping operation consists of 
three main components, namely, the die, the binder, and the punch (Suchy 2006), as 
shown in Fig. 1 .4. During a stamping operation, the periphery of the sheet metal 
workpiece is held between the binder and die flange. As the punch moves down, the 
workpiece is pressed into the die, causing plastic deformation in the workpiece 
material. The flow of the workpiece material into the die is regulated by the binder 
force (Ahmetoglu et al. 1992; Koyama et al. 2004). 

To characterize the stamping process, tonnage measurement has been conducted 
by placing accelerometers on the columns of the stamping machine. In Fig. 1.5, 
the output of an accelerometer is shown, in which four different phases of the forming 
operations are characterized: press idle, travel, contact, and free vibration. When the 
press is idle, the punch runs down until the binder touches the workpiece at point A. 
Then travel starts, and the signal amplitude increases as the stamping force increases, 
until it reaches point B where the punch touches the workpiece. After that, contact 
between the punch and workpiece is established, and metal forming starts. The signal 
quickly increases to its maximum at point C, as the punch pushes the sheet metal into 
the die. At point D, the forming process is completed, and the amplitude of the 
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Fig. 1.2 Vibration signal measured during a milling process 




Fig. 1.3 A CNC milling machine center (Haas Automation, Inc., http://www.haascnc.com) 



vibration signal quickly drops to zero. After point E, vibration of the stamping 
machine diminishes with time, until the next stamping operation starts. 

For nonmetallic material processing, injection molding is widely employed 
because of its capability in mass production of plastic parts. Figure 1.6 illustrates a 
typical injection molding machine. The injection molding process generally consists of 
four stages (Potsch and Michaeli 1995; Bryce 1996; Johannaber 2008): (1) plastication, 
where the raw material is melted in the barrel, (2) injection, during which the melted 
polymer is injected into the mold cavity, (3) packing, holding, and cooling, when 
additional polymer melt is forced into the cavity under high pressure to compensate 
for the volumetric shrinkage until the part is sufficiently solidified, and (4) ejection, 
where the mold opens and the part is ejected out of the mold by the push pins. 

During each injection molding cycle, pressure within the mold cavity varies, 
as illustrated in Fig. 1.7. Such time-variation serves as a measure for identifying 
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Fig. 1.4 A typical stamping machine (BowStar Biz Management Ltd) 
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Fig. 1.5 A typical tonnage signal during stamping process 
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Fig. 1.6 A typical injection molding machine (Ferromatik Milacron) 
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Fig. 1.7 Pressure signal measured during an injection molding process 



and characterizing the various stages of the molding process. At point ® where 
the plasticized polymer enters the cavity, pressure starts to increase from zero in 
an approximately linear gradient relative to the duration of filling time. When the 
melt reaches the end of the cavity at point (D, the material is compacted to ensure 
reproduction of the mold cavity contour. Such a process is indicated by a fast 
pressure ramping rate as shown in the curve from ® to (D. During the holding phase 
from @ to ®, a constant holding pressure is applied to the melt to compensate for 
the contraction of the polymer by injecting additional material into the cavity. As 
the molded part starts to cool down and solidify, viscosity of the material increases 
and the flow channel becomes constricted. As a result, pressure drop is seen from 
the sensor data as indicated by the section from ® to ®. 

The close association between signals and manufacturing, in addition to the 
various processes as illustrated above, is also seen in various components that have 
been employed in various machine equipment. One representative is the rolling 
bearings, which have been widely applied to providing loading support 
and rotational freedom in manufacturing, transportation, aerospace, and defense 
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(e.g., machine tools, trains, helicopters, power generator, etc.). Because of faulty 
installation, inappropriate lubrication, and other unpredictable adverse conditions 
during bearing operations, premature failure of bearings may occur, for example, in 
the form of surface spalling on the bearing raceways. As a result, impulsive signals 
will be generated every time when the rolling elements interact with the defects. 
These impulsive signals subsequently excite the machine system, leading to forced 
vibrations. Figure 1.8 illustrates a customized spindle system where bearings are 
installed. Vibration signals measured at two stages during a run-to-failure 
experiment on the spindle system are shown in Fig. 1.9. 



Fig. 1.8 A customized 
spindle-bearing test system 
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Fig. 1.9 The vibration signals from bearing run-to-failure test, (a) Signal measured from stage I 
when defect is at the initial stage and (b) signal measured from stage II, where defect has grown 
significantly 
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Gearbox, as illustrated in Fig. 1.10, has been employed in a wide range of 
machinery and control systems, because of its ability in transferring both 
power and motion with high efficiency. When a defect is developed in a gear, the 
vibration signal of the gearbox will contain amplitude and phase modulations 
that are periodic with respect to the rotation of the gear. Figure 1.11 shows 
an example of vibration signals measured on a gearbox under different rurming 
conditions. 




Fig. 1.10 An automobile transmission gearbox (Topic Media PTY LTD) 
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Fig. 1.11 Acceleration signals measured on a gearbox: (a) normal condition, (b) slight fault 
condition, and (c) severe fault condition 
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1.3 Role of Signal Processing for Manufacturing 

Growing demand for high-quality and low-cost production has increased the 
need for condition monitoring, health diagnosis, and enhanced controls in 
manufacturing equipment and processes (Tonshoff et al. 1988; Byme et al. 1996; 
Ganesan et al. 2004; Liang et al. 2004). Accordingly, sensor-based information 
acquisition and processing systems have gained increasing attention from the 
research community worldwide (Teti 1996; DimlaSnr 2000; Tseng and Chou 2002; 
Frankowiak et al. 2005). The goal of these efforts is to obtain information in real-time 
about the operation status of the machines and use the information for the following 
purposes: 

1 . Identification of machine faults at the incipient stage such that proper corrective 
measures can be taken before the faults have progressed to cause significant 
structural damage and costly downtime, thus enabling adaptive instead of fixed- 
time maintenance and production scheduling 

2. More accurate control of the quality of products being manufactured, which is 
directly related to the working conditions of the machine 

In addition to monitoring individual machines, data gathered from the sensors 
provide insight into the manufacturing process itself, and can be used to assist in 
high-level decision-making for production optimization. 

Signals encountered in manufacturing machines typically consist of three major 
components: 

1 . A periodic component resulting from the cyclic interactions between the inter- 
facing elements of the machine, such as vibrations caused by the interaction 
between the rolling elements and the raceway 

2. A transient component caused by "one-time" events, such as the sudden breakage 
of a drilling bit or the inception of a crack inside a workpiece 

3. Broadband background noise 

Detection of the existence of these signals in real-time during the manufacturing 
process and extracting relevant information from the signals in a timely manner 
are of significant interest and importance, as they are precursors of potential 
machine defect and product quality deterioration that will negatively impact the 
manufacturing processes. On the other hand, detection of such signals can be 
challenging, as these signals are generally short in duration and weak in amplitude. 
Often times, they can be buried under strong background noises, making 
their detection difficult (Gu et al. 2002; Padovese 2004; Shi et al. 2004). Further- 
more, the one-shot nature of these signals makes the assumption for stationary 
signals invalid, thus reducing the effectiveness of conventional signal processing 
techniques. For example, while Fourier transform has been extensively used in 
conjunction with filtering techniques, its effective utilization depends upon signals 
containing distinct characteristic frequency components of sufficient energy con- 
tent, within a limited frequency band. If the feature components spread over a wide 
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spectrum, it would be difficult to use Fourier transform to differentiate them from 
disturbing or masking components, especially when the feature components are 
weak in magnitude. This has been shown in condition-monitoring studies of bear- 
ings with an incipient defect (Mori et al. 1996). 

Time-frequency and time-scale techniques have been the subject of extensive 
research over the past decade for nonstationary signal analysis. Typical representa- 
tives include the short-time Fourier transform (STFT) and wavelet transform (Li and 
Ma 1997; Satish 1998). STFT was developed to address the limitation of the Fourier 
transform, which is rooted in its basis functions extending over an infinite period of 
time. As a result, Fourier transform is not well adapted to nonstationary transient 
signals with short durations. A solution to this problem is to perform a "time 
localized" Fourier transform within a sliding window, as in the case of STFT 
(Chui 1992). Popular choices for the window function include the Hamming, the 
Hann, and the Gaussian functions. When a Gaussian window is chosen, the STFT is 
called a Gabor transform (Gabor 1946). A one-sided Gaussian window has been 
used for detection of transient signals in a workpiece (Friedlander and Porat 1989). 
The disadvantage of the STFT is that its time resolution (the smallest separation in 
time of two signal components that can be discriminated) and bandwidth cannot be 
chosen to be simultaneously small, according to the uncertainty principle (Cohen 
1989). The time-bandwidth product of the STFT must be greater than or equal to the 
inverse of An. The equal sign holds only when the window function is a Gaussian 
function. This means that the time-frequency resolution over the entire time-fre- 
quency plane is fixed, once the window function is chosen. As a result, a trade-off 
must be made between the time resolution and frequency resolution, when the STFT 
is applied to transient signal analysis. 

To overcome the resolution limitation of the Gabor transform, the wavelet 
transform has been increasingly investigated for nonstationary signal analysis 
(Mallat 1989; Daubechies 1990, 1992; Rioul and Vetterli 1991). In contrast to 
the Gabor transform with fixed windows, the wavelet transform uses short 
windows at high frequencies and long windows at low frequencies (Rioul and 
Vetterli 1991). Such a nature leads to the wavelet transform being called the 
constant relative-bandwidth frequency analysis. Unlike the Fourier transform, 
which expresses a signal as the sum of a series of single-frequency sine and cosine 
functions, the wavelet transform decomposes a signal into a set of basis functions. 
These basis functions are obtained from a single base wavelet function by a 
two-step operation: scaling (through dilation and contraction of the base 
wavelet along the time axis, as will be explained in Chap. 2), and time shift 
(i.e., translation along the time axis). Essentially, the wavelet transform process 
measures the "similarity" between the signal being analyzed and the base wavelet. 
Through variations of the scales and time shifts of the base wavelet function, 
features hidden within the signal can be extracted, without requiring the signal 
to have a dominant frequency band. 

Research on manufacturing machine and process monitoring and diagnosis 
using the wavelet transform has attracted increasing attention worldwide. For 
example, the adaptive capability has made wavelet transform a good analytical 
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tool for decomposing gearbox vibration signals. Studies have demonstrated its 
ability to detect incipient failures as well as differentiating different types of defects 
(Wang and McFadden 1993, 1995; Zheng et al. 2002). The discrete wavelet 
transform has been applied to analyzing spindle motor current for tool failure 
diagnosis in end-milling, under varying cutting conditions (Lee and Tarng 1999). 
Similar studies of wavelet transform for machine tool monitoring have been 
reported (Fu et al. 1998; Li et al. 2000). For detecting localized bearing defects 
and/or estimating the defect severity level, the advantage of wavelet transform has 
been extensively investigated (Wang and Gao 2003; Lou and Loparo 2004; Yan and 
Gao 2005; Chiementin et al. 2007; Wang et al. 2009), and the results have shown its 
superior performance over the conventional, Fourier-based approaches. Other appli- 
cations of wavelet transform, including singularity detection (Sun and Tang 2002), 
denoising and extraction of weak signals (Altmann and Mathew 2001; Lin 2001), 
vibration signal compression (Tanaka et al. 1997; Staszewski 1998), and system 
and parameter identification (Robertson et al. 1998; Kim et al. 2001), have also 
been reported. 

It can be concluded that the wavelet transform provides a powerful mathematical 
tool for the analysis, characterization, and classification of nonstationary signals 
typically seen in manufacturing. The adaptive, multiresolution capability of 
the wavelet transform makes it well suited for decomposing signals of varying 
time and frequency resolutions that are characteristic of the underlying defect 
mechanisms associated with a machine, a dynamical structure, or a manufacturing 
process. Such capability makes the wavelet transform an enabling tool for advancing 
the science base of signal processing in manufacturing. It is such significance and the 
associated potential impact that motivate this book, and it is the intention of the 
book to provide graduate students and practicing engineers with a systematic, 
comprehensive, yet easily accessible coverage of the fundamental theory and repre- 
sentative applications of wavelet transform in the broad and vibrant field of 
manufacturing research. 
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Chapter 2 

From Fourier Transform to Wavelet 

Transform: A Historical Perspective 



To ensure safe and economical operation and product quality, manufacturing 
machines and processes are constantly monitored and evaluated for their working 
conditions, on the basis of signals collected by sensors, which are generally 
presented in the form of time series (e.g., time-dependent variation of vibration, 
pressure, temperature, etc.). To extract information from such signals and reveal the 
underlying dynamics that corresponds to the signals, proper signal processing 
technique is needed. Typically, the process of signal processing transforms a 
time-domain signal into another domain, with the purpose of extracting the charac- 
teristic information embedded within the time series that is otherwise not readily 
observable in its original form. Mathematically, this can be achieved by represent- 
ing the time-domain signal as a series of coefficients, based on a comparison 
between the signal x{t) and a set of known, template functions {i/'„(?)}„g. as 
(Chui 1992; Qian 2002) 



xm:xt)dt (2.1) 



where (■)* stands for the complex conjugate of the function (•). The inner product 
between the two functions x{t) and i/'„(/) is defined as 

(x,^,) = I x{t)ily:{t)dt (2.2) 

Then (2.1) can be expressed in the general form as 

c„ = (x, il/„) (2.3) 

The inner product in (2.3), in essence, describes an operation of comparing the 
"similarity" between the signal x{t) and the template function {i/'„(0}„e-' "^^at is, 
the degree of closeness between the two functions. The more similar x{t) is to ij/^ (t) , 
the larger the inner product c„ will be. On the basis of this notion, this chapter 
presents a historical perspective on the evolution of the wavelet transform. This is 
realized by observing the similarities as well as differences between the wavelet 
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2 From Fourier Transform to Wavelet Transform: A Historical Perspective 
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Fig. 2.1 A nonstationary signal x(t) 



transform and other commonly used techniques, in terms of the choice of the 
template functions {i/'„(r)}„g_. To illustrate the point, a nonstationary signal 
as shown in Fig. 2.1 is used as an example. The signal consists of four groups 
of impulsive signal trains, each containing two transient elements of different 
center frequencies at 1,500 and 650 Hz, respectively. The four groups are separated 
from one another by a 12-ms time interval. Within each group, the two transient 
elements are time-overlapped. The sampling frequency used to capture the signal 
is 10 kHz. 



2.1 Fourier Transform 

The Fourier transform is probably the most widely applied signal processing tool 
in science and engineering. It reveals the frequency composition of a time series 
x{t) by transforming it from the time domain into the frequency domain. In 1807, 
the French mathematician Joseph Fourier (Fig. 2.2) found that any periodic signal 
can be presented by a weighted sum of a series of sine and cosine functions. 
However, because of the uncompromising objections from some of his contempor- 
aries such as J. L. Lagrange (Herivel 1975), his paper on this finding never 
got published, until some 15 years later, when Fourier wrote his own book. The 
Analytical Theory of Heat (Fourier 1822). In that book, Fourier extended his 
finding to aperiodic signals, stating that an aperiodic signal can be represented by 
a weighted integral of a series of sine and cosine functions. Such an integral is 
termed the Fourier transform. 

Using the notation of inner product, the Fourier transform of a signal x{t) can be 
expressed as 



Kf) 



„'W'\ 



x{t)e-'^''fi dt 



(2.4) 



2. 1 Fourier Transform 
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"An arbitrary function, 
continuous or with 
discontinuities, defined in a finite 
interval by an arbitrarily 
capricious graph can always be 
expressed as a sum of sinusoids" 
J.B.J. Fourier 



Fig. 2.2 Jean B. Joseph Fourier (1768-1830) 
Assuming that the signal has finite energy, 

\x{t)\^dt<oo 
Accordingly, the inverse Fourier transform of the signal x{t) can be expressed as 

/•oo 

x{t)= / X{f)e'^^f'df (2.5) 



Signals obtained experimentally through a data acquisition system are generally 
sampled at discrete time intervals AT, instead of continuously, within a total 
measurement time T. Such a signal, defined as x^, can be transformed into the 
frequency domain by using the discrete Fourier transform (DFT), defined as 



DFT{f„)=l^Y.^,e-^'^'"''' 

A-=0 



(2.6) 



where A' = T / h.T is the number of samples, and/„ = «/r, 77 = 0, 1, 2, ... ,A' — 1 
are the discrete frequency components. The inverse DFT can then be expressed as 



(N-\)IT 



Xk 



^ Y: DFT{f,,)e^^f^'^' 



(2.7) 



/»=o 



Equations (2.4) and (2.6) indicate that the Fourier transform is essentially a 
convolution between the time series x{f) or x^ and a series of sine and cosine 
functions that can be viewed as template functions. The operation measures the 
similarity between x{f) or x^ and the template functions, and expresses the average 
frequency information during the entire period of the signal analyzed. In Fig. 2.3, 
such an operation is graphically illustrated. 
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Fig. 2.3 Illustration of the Fourier transform of a continuous signal x(t) 



To compute the DFT of a signal with A' samples, multiplication of an A' x A? 
matrix that contains the primitive «th root of unity er''^^!'^ by the signal is needed. 
Such an operation takes a total of arithmetic operations on the order of A'' to 
complete. The computational time increases quickly as the number of the samples 
increases. For example, a time series oi N = 256 (i.e., 2**) samples takes 65,536 
operational steps to complete, whereas for N = 4,096 (i.e., 2'^), a total of 
16,777,216 steps will be needed to compute its DFT. The high computational 
cost limited the widespread application of the DFT in its early stage, until a more 
efficient algorithm, called the Cooley-Tukey algorithm, was introduced in 1965 
(Cooley and Tukey 1965). This algorithm is also called the fast Fourier transform 
(FFT), and what it does is to recursively break down a DFT of a large data sample 
(i.e., a large A') into a series of smaller DFTs of smaller samples by dividing the 
transform with size N into two pieces of size A^/2 at each step, and reduce the 
arithmetic operations to a total of A' log (A'). Comparing to the A'^ operations 
required for DFT, this represents a time reduction of up to 96%, when, for example, 
the data sample number A' is 256. 

In practice, the phenomena of leakage and aliasing can happen during the 
calculation of DFT (Komer 1988). Leakage is caused by the discontinuities 
involved when a signal is extended periodically for performing the DFT. Applying 
a window to the signal to force it to contain a full period can prevent leakage from 
happening. However, the window itself may contribute frequency information to 
the signal. Aliasing occurs when the Shannon's sampling theorem is violated, 
(Bracewell 1999) causing the actual frequency component to appear at different 
locations in the frequency spectrum. This can be solved by ensuring the sampling 
frequency to be at least twice as large as the maximum frequency component 
contained in the signal (Bracewell 1999). This requires, however, that the maxi- 
mum frequency component is known a priori. 

The Fourier transform of the signal shown in Fig. 2. 1 is illustrated in Fig. 2.4. 
The figure shows two major frequency peaks at 650 and 1,500 Hz, respectively. 
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Fig. 2.4 Fourier transform results of the signal x(t) 



However, it does not reveal how the signal's frequency contents vary with time; 
that is, the figure does not reveal if the two frequency components are continuously 
present throughout the time of observation or only at certain intervals, as is 
implicitly shown in the time-domain representation. Because the temporal structure 
of the signal is not revealed, the merit of the Fourier transform is limited; specifi- 
cally, it is not suited for analyzing nonstationary signals. On the other hand, as 
signals encountered in manufacturing are generally nonstationary in nature (e.g., 
subtle, time-localized changes caused by structural defects are typically seen in 
vibration signals measured from rotary machines), a new signal processing tech- 
nique that is able to handle the nonstationarity of a signal is needed. 



2.2 Short-Time Fourier Transform 



A straightforward solution to overcoming the limitations of the Fourier transform is 
to introduce an analysis window of certain length that glides through the signal 
along the time axis to perform a "time-localized" Fourier transform. Such a concept 
led to the short-time Fourier transform (STFT), introduced by Dennis Gabor 
(Fig. 2.5) in his paper titled "Theory of communication," published in 1946 
(Gabor 1946). 

As shown in Fig. 2.6, the STFT employs a sliding window function g{t) 
that is centered at time t. For each specific t, a time-localized Fourier transform 
is performed on the signal x{t) within the window. Subsequently, the window 
is moved by t along the time line, and another Fourier transform is performed. 
Through such consecutive operations, Fourier transform of the entire signal can 
be performed. The signal segment within the window function is assumed to 
be approximately stationary. As a result, the STFT decomposes a time domain 
signal into a 2D time-frequency representation, and variations of the frequency 
content of that signal within the window function are revealed, as illustrated in 
Fig. 2.6. 
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Fig. 2.5 Dennis Gabor 
(1900-1979) 




Time Shift r 




Fig. 2.6 Illustration of short-time Fourier transform on the test signal x(t) 



Using the inner product notation as before, the STFT can be expressed as 



STFT{T,f) = {x,g,j) ^ j x{t)glf{t)dt = j x{t)g{t - t)< 



e-^^'^f'dt (2.8) 



Equation (2.8) can also be viewed as a measure of "similarity" between the signal 
x{t) and the time-shifted and frequency-modulated window function g{t). Over the 
past few decades, various types of window functions have been developed (Oppen- 
heim et al. 1999), and each of them is specifically tailored toward a particular type 
of application. For example, the Gaussian window designed for analyzing transient 
signals, and the Hamming and Harm windows are applicable to narrowband, 
random signals, and the Kaiser-Bessel window is better suited for separating two 
signal components with frequencies very close to each other but with widely 
differing amplitudes. It should be noted that the choice of the window function 
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directly affects the time and frequency resolutions of the analysis result. While 
higher resolution in general provides better separation of the constituent compo- 
nents within a signal, the time and frequency resolutions of the STFT technique 
cannot be chosen arbitrarily at the same time, according to the uncertainty principle 
(Cohen 1989). Specifically, the product of the time and frequency resolutions is 
lower bounded by 

Ar.A/>^ (2.9) 

where At and Af denote the time and frequency resolutions, respectively. Analyti- 
cally, the time resolution At is measured by the root-mean-square time width of the 
window function, defined as 

^,.^/^W^ (2.10) 

Similarly, the frequency resolution A/ is measured by the root-mean-square band- 
width of the window function, and is defined as (Rioul and Vetterli 1991) 

^f=m9Mpf_ (2.11) 

In (2.11), G{f) is the Fourier transform of the window function g{i). As 
an example, the Gaussian window function g{t) = e""' ^' (with a being a constant 
and T controlling the window width) has the time and frequency resolutions of 
At = T/(2i/a) and A/' — \foLJ(x ■ 2n), respectively. As a result, the time-frequency 
resolution provided by the Gaussian window when analyzing a signal x(t) 
is At • A/' — 1/471. As the time and frequency resolutions of a window function 
are dependent on the parameter t only, once the window function is chosen, 
the time and frequency resolutions over the entire time-frequency plane are 
fixed. Illustrated in Fig. 2.7 are two scenarios where the products of the time and 
frequency resolutions of the window function (i.e., the area defined by the 
product of At • A/ ) are the same, regardless of the actual window size (t or t/2 ) 
employed. 

The effect of the window size t on the time and frequency resolutions is 
illustrated in Fig. 2.8, where STFT with the Gaussian window was performed 
on the signal shown in Fig. 2.1. Altogether three different window sizes (i.e., 1.6, 
6.4, and 25.6 ms) were chosen. While the smallest window width of 1.6 ms 
has provided high time resolution in separating the four pulse trains contained 
in the signal, as illustrated in Fig. 2.8a, its frequency resolution was too low 
to differentiate the two time-overlapped transient elements within each group. 
As a result, the frequency elements 1,500 and 650 Hz are displayed as one 
lumped group on the time-frequency plane. In contrast, the largest window width 
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Fig. 2.7 Time-frequency resolutions associated with the STFT technique, (a) Window size t and 
(b) window size t/2 



of 25.6 ms provided good frequency resolution to illustrate the two frequency 
components in Fig. 2.8b. However, the time-resolution was insufficient to differen- 
tiate the four pulse trains that are timely separated with a 12-ms interval. The 
best overall performance is given by the window width of 6.4 ms, shown in 
Fig. 2.8c, which allowed for all of the transients to be adequately differentiated 
on the time-frequency plane. Given that the specific frequency content of an 
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Fig. 2.8 Results of the STFT 
of the signal using three 
different window sizes. 
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experimentally measured signal is generally not known a priori, selection of 
a suitable window size for effective signal decomposition using the STFT technique 
is not guaranteed. The inherent drawback of the STFT motivates researchers 
to look for other techniques that are better suited for processing nonstationary 
signals. One of such techniques, which is the focus of this book, is the wavelet 
transform. 
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Fig. 2.9 Alfred Haar 
(1885-1933) 




Fig. 2.10 The rectangular 
basis function 
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2.3 Wavelet Transform 



From a historical point of view, the first reference to the wavelet goes back to 
the early twentieth century when Alfred Haar (Fig. 2.9) wrote his dissertation titled 
"On the theory of the orthogonal function systems" in 1909 to obtain his doctoral 
degree at the University of Gottingen. His research on orthogonal systems of 
functions led to the development of a set of rectangular basis functions (Haar 
1910), as illustrated in Fig. 2.10. Later, an entire wavelet family, the Haar wavelet, 
was named on the basis of this set of functions, and it is also the simplest wavelet 
family developed till this date. 

Essentially, Haar's basis function consists of a short positive pulse followed by 
a short negative pulse, and it was used to illustrate a countable orthonormal system 
for the space of square-integrable functions on the real line (Haar 1910). Later, the 
Haar basis function was applied to compress images (DeVore et al. 1992). 

Little advancement in the field of wavelets was reported after Haar's work, until 
a physicist, Paul Levy (Fig. 2. 11), investigated the Brownian motion in the 1930s. 
He discovered that the scale-varying function, that is, the Haar basis function, 
was better suited than the Fourier basis functions for studying subtle details in 
the Brownian motion. In addition, the Haar basis function can be scaled into 
different intervals, such as the interval [0, 1] or the intervals [0, 1/2] and [1/2, 1], 
thereby providing higher precision when modeling a function than that provided by 
the Fourier basis function, as it can only have one interval [-oo, -oo]. 
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Fig. 2.11 Paul Levy 
(1886-1971) 




Fig. 2.12 Jean Morlet 
(1931-2007) 




While several individuals, such as John Littlewood, Richard Paley (Littlewood 
and Paley 1931), Elias M. Stein (Jaffard et al. 2001), and Norman H. Ricker 
(Ricker 1953) have contributed, from the 1930s to the 1970s, to advancing 
the state of research in wavelets as it is called today, major advancement in the 
field was attributed to Jean Morlet (Fig. 2.12) who developed and implemented 
the technique of scaling and shifting of the analysis window functions in analyz- 
ing acoustic echoes while working for an oil company in the mid 1970s (Mackenzie 
2001). By sending acoustic impulses into the ground and analyzing the 
received echoes, the existence of oil beneath the earth crust as well as the thickness 
of the oil layer can be identified. When Morlet first used the STFT to analyze 
these echoes, he found that keeping the width of the window function fixed 
did not work. As a solution to the problem, he experimented with keeping 
the frequency of the window function constant while changing the width of the 
window by stretching or squeezing the window function (Mackenzie 2001). The 
resulting waveforms of varying widths were called by Morlet the "Wavelet", and 
this marked the beginning of the era of wavelet research. As a matter of fact, 
the approach that Morlet used was similar to what Haar did before, but the 
theoretical formation of the wavelet transform was first proposed only after 
Jean Morlet teamed up with Alex Grossmann to work out the idea that a signal 
could be transformed into the form of a wavelet and then transformed back into its 
original form without any information loss (Grossmann and Morlet 1984). 
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Fig. 2.13 Illustration of wavelet transform 



In contrast to the STFT technique where the window size is fixed, the wavelet 
transform enables variable window sizes in analyzing different frequency compo- 
nents within a signal (Mallat 1998). This is realized by comparing the signal with a 
set of template functions obtained from the scaling (i.e., dilation and contraction) 
and shift (i.e., translation along the time axis) of a base wavelet i/'(f) and looking for 
their similarities, as illustrated in Fig. 2.13. 

Using again the notation of inner product, the wavelet transform of a signal x{t) 
can be expressed as 



wt{s, x) = (x, \j/ ) = 



1 



x{tW 



t ~ X 

s 



dt 



(2.12) 



where the symbol s>Q represents the scaling parameter, which determines the time 
and frequency resolutions of the scaled base wavelet i//(/ — x/s). The specific values 
of s are inversely proportional to the frequency. The symbol x is the shifting 
parameter, which translates the scaled wavelet along the time axis. The symbol 
!/'*(•) denotes the complex conjugation of the base wavelet \l/{t). As an example. 



if the Morlet wavelet i/'(r) — e 
version will be expressed as 



^ Jlnfot p-C-'^lp-) 



is chosen as the base wavelet, its scaled 



p'^i/oV 



(2.13) 
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Fig. 2.14 Time and frequency resolutions of the wavelet transform (.$2 = 2.si ) 

with the parameters /o, a, and /? all being constants. The corresponding time and 
frequency resolutions of the Morlet wavelet will be calculated as A? — sfS /2^/a. and 
Af = y/a/{s ■ 2Tt[i), respectively. These expressions indicate that the time and 
frequency resolutions are directly and inversely proportional to the scaling param- 
eter s, respectively. In Fig. 2. 14, variations of the time and frequency resolutions of 
the Morlet wavelet at two locations on the time-frequency (t-f) plane, {ri,ri/si) 
and (t2,77/.S2), are illustrated. 

It is seen that changing the scale from s at the location (ti, 77/^1) to S2 = 2si at 
(f 2, v/^2) decreases the time resolution by half (as the width of the time window is 
doubled) while doubling the frequency resolution (because the width of the fre- 
quency window is reduced to half). Through variations of the scale s and time shifts 
(by t) of the base wavelet function, the wavelet transform is capable of extracting 
the constituent components within a time series over its entire spectrum, by using 
small scales (corresponding to higher frequencies) for decomposing high frequency 
components and large scales (corresponding to lower frequencies) for low fre- 
quency components analysis. As an example, Fig. 2.15 illustrates the result of the 
wavelet transform performed on the signal shown in Fig. 2. 1 , using the Morlet base 
wavelet. It is evident that all the transient components are differentiated in the time 
scale domain. 

Following up the impactful work of Morlet and Grossmann, numerous 
researchers have invested significant effort in further developing the theory of 
wavelet transform. Examples include Stromberg's early work on discrete wavelets 
in 1983 (Stromberg 1983), Grossmann, Morlet, and Paul's work on analyzing 
arbitrary signals in terms of scales and translations of a single base wavelet function 
(Grossmann et al. 1985, 1986), and Newman's work on Harmonic wavelet trans- 
form in 1993 (Newland 1993). Perhaps the most important step that has led to the 
prosperity of the wavelets was the invention of multiresolution analysis by 



30 



2 From Fourier Transform to Wavelet Transform: A Historical Perspective 




1 



Fig. 2.15 Wavelet transform of the signal 



Fig. 2.16 Stephane Mallat 




Stephane Mallat (Fig. 2.16) (Mallat 1989a, b, 1999) and Yves Meyer (Fig. 2.17) 
(Meyer 1989, 1993). Such an invention was introduced by a paper written by Meyer 
on orthogonal wavelets, entitled "Orthonormal wavelets" (Meyer 1989). 

The key to multiresolution analysis is to design the scaling function of the 
wavelet such that it allowed other researchers to construct their own base wavelets 
in a mathematically grounded fashion. As an example, Ingrid Daubechies 
(Fig. 2.18) created her own family of wavelet, the Daubechies wavelets, around 
1988 (Daubechies 1988, 1992), on the basis of the concept of multiresolution. 
Figure 2.19 illustrates one member of the Daubechies wavelet family: Daubechies 
2 base wavelet. This type of wavelet is orthogonal and can be implemented using 
simple digital filtering techniques. 

Since then, a proliferation of activities on wavelet transform and its applications 
in many fields has been seen. These include image processing, speech processing, 
as well as signal analysis in manufacturing which is the focus of this book. 
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Fig. 2.17 Yves Meyer 
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Fig. 2.18 Ingrid Daubechies 




Fig. 2.19 Daubechies 2 base 
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Chapter 3 

Continuous Wavelet Transform 



Wavelet transform is a mathematical tool that converts a signal into a different 
form. This conversion has the goal to reveal the characteristics or "features" hidden 
within the original signal and represent the original signal more succinctly. A base 
wavelet is needed in order to realize the wavelet transform. The wavelet is a small 
wave that has an oscillating wavelike characteristic and has its energy concentrated 
in time. Figure 3.1 illustrates a wave (sinusoidal) and a wavelet (Daubechies 
4 wavelet) (Daubechies 1992). 

The difference between a wave and a wavelet is that a wave is usually smooth 
and regular in shape, and can be everlasting, while in contrast, a wavelet may be 
irregular in shape, and normally lasts only for a limited period of time. A wave (e.g., 
sine and cosine) is typically used as a deterministic template in the Fourier 
transform for representing a signal that is time-invariant or stationary. In compari- 
son, a wavelet can serve as both a deterministic and nondeterministic template for 
analyzing time-varying or nonstationary signals by decomposing the signal into a 
2D, time-frequency domain. 

Mathematically, a wavelet is a square integrable function i/^(r) that satisfies the 
admissibility condition (Chui 1992; Meyer 1993; Mallat 1998): 

^^^<5f/<oo (3.1) 

In this equation, ^(/) is the Fourier transform (i.e., frequency domain expression) of 
the wavelet function i/'(f) (in the time domain). The admissibility condition implies that 
the Fourier transform of the function i/'(?) vanishes at the zero frequency; that is, 

l^(/) I' 1^=0=0 (3.2) 

This means that the wavelet must have a band-pass like spectrum. A zero at the zero 
frequency also means that the average value of the wavelet \j/{t) in the time domain 
is zero: 

/■oo 

\l/{t)dt = (3.3) 
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3 Continuous Wavelet Transform 





a sinusoidal wave a wavelet 

Fig. 3.1 Representation of a wave and a wavelet, (a) A sinusoidal wave and (b) a wavelet 

Equation (3.3) indicates that the wavelet must be oscillatory in nature. Through 
the process of dilation (i.e., stretching or squeezing the wavelet function by l/s) and 
translation, (i.e., shift along time axis by i), a family of scaled and translated 
wavelets can be obtained as 



*"(" = j!H^)' 



.?>0, T € 7? 



(3-4) 



The purpose of having the factor -j- in (3.4) is to ensure that the energy of the 
wavelet family will remain the same under different scales. For example, by 
assuming that the energy of the wavelet function i/'(r) is given by 



\i^{t)\'dt (3.5) 

the energy of the scaled and translated wavelets ^sA^) '^^^ tie calculated as 

1 



irA-^ 



dt = 



dt = ?. 



(3.6) 



As a result, the energy of the original base wavelet (/'(?) and the scaled and 
translated wavelets remains the same. The relationship between xjjif) and ^sA'^) i^ 
illustrated in Fig. 3.2, and the process through which a signal is decomposed by 
analyzing it with a family of scaled and translated wavelets such as i^j ^(r) is called 
the wavelet transform. 

Generally, the wavelet transform can be represented in continuous (i.e., contin- 
uous wavelet transform (CWT)) as well as in discrete forms (i.e., discrete wavelet 
transform). The CWT of a signal x{t) is defined as (Rioul and Vetterli 1991) 



wt{s, t) 



1 

7^ 



4tw(-j^) 



dt 



(3.7) 



where (/'*(•) is the complex conjugate of the scaled and shifted wavelet function 

'A(-)- 

As shown in this definition, the CWT is as an integral transformation. In this sense, 
it is similar to the Fourier transform in that integration operation will be performed in 
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Fig. 3.2 Illustration of translation (by the time constant x) and dilation (by the scaling factor s) 



both transforms. On the other hand, as the wavelet contains two parameters (scale 
parameter s and translation parameter t), transforming a signal with the wavelet 
basis means that such a signal will be projected into a 2D, time-scale plane, instead 
of the ID frequency domain in the Fourier transform. Furthermore, because of the 
localization nature of the wavelet, the transformation will extract features from the 
signal in the time-scale plane that are not revealed in its original form, for example, 
what specific bearing defect-related spectral components existed at what time. 



3.1 Properties of Continuous Wavelet Transform 

Equation (3.7) indicates that the CWT is a linear transformation, characterized by 
the following properties. 



3.1.1 Superposition Property 

Suppose x{t), y{t) e L^{R), and ki and ^2 are constants. If the CWT of x{t) is 
wt_,{s, t) and the CWT of y{t) is wty{s, t), then the CWT of z{t) = kix{t) + %(r) 
is given by 
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wfz(5, t) = kiwt^^{s, t) + k2Wty{s, x) (3.8) 

Proof: Let wt^{s, t) = ^ /-^(O'A* (^)^^ and wty{s, t) = ^ /^(O'A* (^)d^; 
then 



wuXs,i)=^J z{tW(-^) 



dt 



= ki^ j x{t)\i/*{-—^dt + k2^ j y{tW{^ ^-^dt 

= kiwtj.{s, t) + k2Wty{s, t) 

This proves the superposition property of the CWT. 



3.1.2 Covariant Under Translation 

Suppose the CWT of x{t) is wtt{s, t); then the CWT of x{t — to) is wtj{s, r — to). 
The proof of this property is shown below: 
Proof: Let j<'{t) ^ x{t ~ to); then 

wry(i,T)=-= I x{t^to)\l/*{^-^dt (3.10) 

Let f = t — to; then 

wt,,{s,z) = ^^ I x(/)iA* ( ^ ^ ^^' " ^ \ dt' =wt,{s, T - to) (3.11) 

This means that the wavelet coefficients of j:(f — to) can be obtained by translating 
the wavelet coefficients of x(;) along the time axis with to- 



3.1.3 Covariant Under Dilation 

Suppose the CWT of jc(r) is wt,,{s, t); then the CWT of jc(^) is Vaw'r,(f ,^) 
Proof: Let x'{t) = x(j); then 

.tAs,r) = ^jAW(-^)dt ^ ^JxQr(-^)dt (3.12) 
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Let t' = -; then (3.12) becomes 

^ ^ ' ' (3.13) 



|/xw(^)^/=y^-.e,3 



Equation (3.13) indicates that, when a signal is dilated by a, its corresponding 
wavelet coefficients are also dilated by a along both the scale and time axes. 



3.1.4 Moyal Principle 

Suppose x{t), y{t) e L^{R). If the CWT of x{t) is wt^{s, x) and the CWT of y{t) is 
wty{s, t); that is, 

w?,(^,T) = (A-(r),f„(r)) (3.14a) 

wty{s,x) = {y{t),^lI,M (3-14b) 

then 

{wt,{s,x),wty{s,x)) ^C,^{x{t),y{t)) (3.15) 

where C,/, = Jg^ ' H df. The proof of this property is as follows. 

/"roo/ According to the Parseval's theorem, the inner product of two functions in 
time domain can be equivalently given in the frequency domain as 

{x{t),y{t))=^jx{f)Y*{f)df (3.16) 

Consequently, we have 

wtM t) = {<t),^iJ,M - ^ / X{f)X,{f)df (3.17a) 

my{s, x) = {y{t),^lj,M ^^J Y{f)X,{f)df (3.17b) 

From (3.4), we know that il/,^{t) — ^i/'(^). Therefore, 

'i'sAf) = V~s'i'{sf)e-^^''^'' (3.18a) 
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XAf) = Vs'i'*{sf)e'^''^' (3.18b) 

By incorporating (3.18b) into (3.17) and utilizing the following integral relation, 



/ 



-j2M(f-f')T, _ 



dx^2n5{f~f') (3.19) 



the left side of (3.15) can be expanded as 



s ff ds , 



(wf,(.,T),wf,(.,T)) =-yy -^x{f)r{f)^{sm*{sf)df 



1 
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ds 



(3.20) 
^f)y*{f)df 



As /ffi^at j^ ^ j\^^^d{sf) = /^^^(/') = C^, (3.20) can be expressed as 

{wt,{s,x),wty{s,T)) = C,^^ J X{f)Y*{f)df ^ C^{x{t),y{t)) (3.21) 

This proves the existence of Moyal principle for CWT. It is noted that C^ is 
actually the admissibility condition of the wavelet. Only when this condition is 
satisfied can the Moyal principle exist. Furthermore, if x{t) ^y{t), then (3.15) 
becomes 

/ -^ \wt_,{s,Tf\dx = C^ \4t)\^dt (3.22) 

Jo ^ J — OC J —OQ 

This means that the integral of the square of wavelet coefficients is proportional 
to the energy of the signal. 



3.2 Inverse Continuous Wavelet Transform 

A transformation is considered to be meaningful in practice only when its 
corresponding inverse transformation exists. The same principle applies to the 
CWT. It can be shown that, as long as the wavelet satisfies the admission condition 
as defined in (3.1), the inverse CWT will exist. This means that a signal can be 
perfectly reconstructed from its corresponding wavelet coefficients, which can be 
written as 



X(0=— / ^ / Wt,{s,x)ll/,,{t)dT 
<-'i/< Jo J J-oo 

1 rds f°^ , . 1 , /r - T\ 
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1 rds 

r I .2 I ^'^^(i,-!:)^!^! -)dx 

C.A Jo ^ J-oo V'5 \ s J 

where C,/, = /(J^ ' f' df<oo is the admission condition of the wavelet i/'(f). 
The proof of (3.23) is shown below: 
Proo/ Assume that xi(f) = A-(r), andx2(f) = 8{t - /'). As (x(r),^(r- /)) ^ x{t'), 

C^xit')^C,i,{x{t),5{t-t')) (3.24) 

According to the Mayol principle shown in (3.15), (3.24) can be further written as 

C^x{t') = {wt,{s,r),wts^,_f^{s,'c)) 

-J / wt,{s,-c)wtlf^,_,,^{s,T)dx 
^ J 

ds f 

'^ J 

-^ / wt,-{s,x)(il/,^,{t),5{t-t'))dr 

'5 J 



'^ 



-T I ^'tx{s,r)ilj,,{t')dx 



ifsL f/-^.^-(^'^)^('^v^ 



This illustrates that the inverse CWT exists. 



3.3 Implementation of Continuous Wavelet Transform 

To implement the CWT, two approaches can be taken. The first approach is to 
obtain the wavelet coefficients directly from (3.7). The computation procedure is as 
follows: 

1 . The wavelet is placed at the beginning of the signal, and set s — \ (the original, 
base wavelet). 

2. The wavelet function at scale "1" is multiplied by the signal, integrated over all 
times, and then multiplied by I/a/s. 

3. Shift the wavelet to ? = t, and get the transform value at r = t and .? = 1. 

4. Repeat the procedure until the wavelet reaches the end of the signal. 

5. Scale s is increased by a given value, and the above procedure is repeated for all s. 

6. Each computation for a given s fills the single row of the time-scale plane. 

7. Wavelet transform is obtained if all s are calculated. 
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The second approach to implementing the CWT is on the basis of the convolu- 
tion theorem, which states that the Fourier transform of the convolution operation 
on two functions in the time domain is the product of the respective Fourier 
transforms of these two functions in the frequency domain (Bracewell 1999). The 
Fourier transform of (3.7) is expressed as 



WT{sJ) ^ F{wt{s,x)} = 



1 
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x{tW 



dt 



,-j2'r/'r 



Applying the convolution theorem to (3.26) leads to 

WT{sJ) = ^sX{f)T*{sf) 



dx (3.26) 



(3.27) 



where X{f) denotes the Fourier transform of x(f) and '?'*(•) denotes the Fourier 
transform of i/'*(). By taking the inverse Fourier transform, (3.27) is converted 
back into the time domain as 



wt{s,t) = F-'{WT{sJ)} - ^sF-'{X{f)^'*{sf)} 



(3.28) 



where the symbol /^"' [•] denotes the operator of inverse Fourier transfomi. There- 
fore, the implementation of the CWT can be realized through a pair of Fourier and 
inverse Fourier transforms. 

Figure 3.3 illustrates the procedure for implementing the CWT. After taking the 
Fourier transform of the signal x(r) and the scaled base wavelet i//(5, t) to obtain 
their frequency information X{f) and 'P{sf), respectively, the inner product 
between X(f) and complex conjugate of f (.yf ) is calculated. Next, the CWT of 
the signal x{t), denoted as cwt{s,t), is obtained by taking the inverse Fourier 
transform on the inner product of WT{s,f). 



Fig. 3.3 Procedure for 
implementing the continuous 
wavelet transform 
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Fig. 3.4 Mexican hat wavelet {left) and its magnitude spectrum (right) 

3.4 Some Commonly Used Wavelets 

This section introduces several commonly used wavelets for performing the CWT. 

3.4.1 Mexican Hat Wavelets 

The Mexican hat wavelet is a normalized, second derivative of a Gaussian function, 
which is mathematically defined as (Mallat 1998) 



m = 



1 



Ina^ 



1 — ^ I e2c- 



(3.29) 



Figure 3.4 illustrates the Mexican hat wavelet and its associated magnitude 
spectrum. The Mexican hat wavelet is often called the Ricker wavelet in geophys- 
ics, where it is frequently employed to model seismic data (Zhou and Adeli 2003; 
Erlebacher and Yuen 2004). 



3.4.2 Morlet Wavelet 



The Morlet wavelet is defined as (Grossmann and Morlet 1984; Teolis 1998) 

rjj^{t)=^^^-f''e-r. (3.30) 



/Tt/b 

where /b is the bandwidth parameter and/c denotes the wavelet center frequency. As 
an example, Fig. 3.5 illustrates the complex Morlet wavelet function and its 
corresponding magnitude spectrum when /t, = 1 Hz and /^ — 1 Hz. The Morlet 
wavelet has been widely used for identifying transient components embedded in a 
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Fig. 3.5 Morlet wavelet (left) and its magnitude spectrum {right): fi, = 1 Hz and/e = 1 Hz 
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Fig. 3.6 Gaussian wavelet (left) and its magnitude spectrum (right): N = 2 

signal, for example, bearing defect-induced vibration (Lin and Qu 2000; Nikolaou 
and Antoniadis 2002; Yan and Gao 2009). 



3.4.3 Gaussian Wavelet 



Mathematically, a Gaussian function is expressed as (Teolis 1998) 



f{t)=e-J>e-' (3.31) 

Taking the Mh derivative of this function yields the Gaussian wavelet as 



i/'g(0 = Cn 






(3.32) 



where A^ is an integer parameter ( > 1 ) and denotes the order of the wavelet, and c^ is 
a constant introduced to ensure that ll/'^^HOII = 1- Figure 3.6 illustrates the 
Gaussian function with its magnitude spectrum for the case ofN = 2. The Gaussian 
wavelet is often used for characterizing singularity that exists in a signal (Mallat 
and Hwang 1992; Sun and Tang 2002). 



3.4 Some Commonly Used Wavelets 

3.4.4 Frequency B-Spline Wavelet 
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A frequency B-spline wavelet is defined as (Teolis 1998) 



'/'b(0 = \/h 



sin c 



'ht 



gJ27lfcl 



(3.33) 



where /i, is the bandwidth parameter, f^ denotes the wavelet center frequency, and 
p is an integer parameter {>2). The notation of sin c(-) is a sin c function, which is 
defined as 



sinc(x) 



1 X = 

^^*s^ otherwise 



(3.34) 



As an example, a B-spline wavelet for the case of/b — 1 Hz,/c — 1 Hz, and p = 2 
together with its corresponding magnitude function is shown in Fig. 3.7. The appli- 
cation of the frequency B-spline wavelet has been seen in biomedical signal analysis 
(Moga et al. 2005; Fard et al. 2007). 



3.4.5 Shannon Wavelet 

The Shannon wavelet is a special case of the frequency B-spline wavelet foip— 1: 

'AsW = v^sinc(/br)e'2'^^^' (3.35) 

where /t, is the bandwidth parameter and /(, denotes the wavelet center frequency. 
The notation of sin c(-) is a sin c function and defined in (3.34). Figure 3.8 
illustrates the Shannon wavelet for the case of /b = 1 Hz and/c = 1 Hz, with its 
corresponding magnitude spectrum. The Shannon wavelet has been shown for the 
analysis and synthesis of the 1// processes (Shusterman and Feder 1998). 
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Fig. 3.7 Frequency B-spline wavelet (left) and its corresponding magnitude spectrum (right): 
p = 2,/b = 1 Hz, and/c = 1 Hz 
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Fig. 3.8 Shannon wavelet (left) and its corresponding magnitude spectrum (right): f^ = 1 Hz and 
/c = 1 Hz 




-5-4-3-2-1 1 2 3 4 5 

Time (Sec) 



1.5 -1 -0.5 0.5 1 

Frequency (Hz) 



1.5 2 



Fig. 3.9 Harmonic wavelet (left) and its magnitude spectrum (right): m = 0.5 Hz and « = 1.5 Hz 

3.4.6 Harmonic Wavelet 

The harmonic wavelet is defined in the frequency domain as (Newland 1994a, b; 
Yan and Gao 2005) 










elsewhere 



(3.36) 



where the symbols m and n are the scale parameters. These parameters are real but 
not necessarily integers. Furthermore, the bandwidth /(, and center frequency /c are 
determined by the scale parameter as 



fh = n- m; 



fc 



n -\- m 

2 



(3.37) 



As an example, Fig. 3.9 shows the harmonic wavelet function and its 
corresponding magnitude spectrum for the case of ot = 0.5 and n ~ 1.5. The 
harmonic wavelet was first designed by Newland for analyzing vibration signals 
(Newland 1993). Later, the application of harmonic wavelet has been extended to 
heart rate variability analysis (Bates et al. 1997) and image denoising (Iftekhar- 
uddin 2002). 



3.5 CWT of Representative Signals 

3.5 CWT of Representative Signals 
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Using the wavelets introduced in Sect. 3.4, the CWT is applied to several typical 
signals, as described below. 



3.5.1 CWT of Sinusoidal Function 

The first signal analyzed is a pure sinusoidal function. Figure 3.10a illustrates a 50 Hz 
sinusoidal signal, and Fig. 3. 10b illustrates the CWT results of the signal. It is seen that 
the 50 Hz component is present all the time throughout the analysis duration. 




0.4 0.6 

Time (s) 

Time domain waveform of a sinusoidal function 




Time (s) 



CWT results of the sinusoidal function 
Fig. 3.10 A sinusoidal function (a) time domain waveform (b) CWT results 
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Time domain waveform of a Gaussian pulse function 
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CWT results of the Gaussian pulse function 
Fig. 3.11 A Gaussian pulse function (a) time domain waveform (b) CWT results 

3.5.2 CWT of Gaussian Pulse Function 

The second signal is a Gaussian pulse function. Figure 3.11a shows a Gaussian pulse 
signal with 10 kHz center frequency. Figure 3.1 lb illustrates the CWT results of the 
Gaussian pulse signal, which is identified in the time-scale domain at around s. 



3.5.3 CWT of Chirp Function 



The last signal is a chirp function. Figure 3. 12a shows an example of a chirp signal. 
It is a linear swept-frequency signal with the instantaneous frequency being 50 Hz 
at time zero. The instantaneous frequency 10 Hz is achieved after 1 s. Figure 3.12b 
illustrates the CWT results of the chirp signal, and the change of frequency along 
with time can be clearly seen. 



3.7 References 
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CWT results of the chirp function 
Fig. 3.12 A chirp function (a) time domain waveform (b) CWT results 

3.6 Summary 

This chapter begins with definition of a wavelet, where the admissibility condition 
that a wavelet should satisfy is emphasized. The CWT and its related properties 
are then introduced. Two approaches for implementing the CWT are discussed 
in Sect. 3.3, followed by the introduction of some commonly used wavelets in 
Sect. 3.4. Typical signals are analyzed using the CWT and the results are shown 
in Sect. 3.5. 
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Chapter 4 

Discrete Wavelet Transform 



According to the definition of the continuous wavelet transform (CWT) given in 
(3.7), Chap. 3, the scale parameter s and translation parameter t can be varied 
continuously. As a result, performing the CWT on a signal will lead to the genera- 
tion of redundant information. Although the redundancy is useful in some applica- 
tions, such as signal denoising and feature extraction where desired performance is 
achieved at the cost of increased computational time and memory size, other 
applications may need to emphasize reduced computational time and data size, 
for example, in image compression and numerical computation. Such requirements 
illustrate the need for reducing redundancy in the wavelet coefficients among 
different scales as much as possible, while at the same time, avoiding sacrificing 
the information contained in the original signal. This can be achieved by parameter 
discretization, as described in the following section. 



4.1 Discretization of Scale and Translation Parameters 

The approach to reducing redundancy is to use discrete values of scale and 
translation parameters. A natural way to implement this is to use a logarithmic 
discretization of the scale s and then link it to step size taken between the values of 
translation parameter t. This type of discretization is expressed as 



< ■ '° ,. 5o<i, Toy^o, ,/ez, kez (4.1) 

[ T = ^10^0 

where the symbol Z denotes an integer. The corresponding family of the base 
wavelet is then expressed as 




(4.2) 
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Generally, the values of 5o = 2 and to = 1 are adopted (Addison 2002). Conse- 
quently, (4.2) is expressed as 

As a result, the wavelet transform of a given signal x(t) is obtained as 

1 r^ ft — k7i\ 

wt(j, k) = (a-(0, ^,,,(r)) = ^ y ^ ^tW [—^) dt (4.4) 

where the symbol (•) denotes inner product operation. Equation (4.4) poses the 
following two questions: 

1 . Can the results of the discretized wavelet transform represent the entire infor- 
mation content of the signal jr(0? In other words, can the wavelet coefficients 
obtained as a result of the wavelet transform be used to peifectly reconstruct the 
original signal x(t)l 

2. Can any signal x{t) be expressed as the summation of i//, ,(.(?), in the form of the 
following equation? 

x{t) = J2Cj,tikjAt) (4.5) 

In (4.5), Cj^it represents the coefficient of the discrete wavelet transform (DWT), 
which corresponds to wt{i, k) in (4.4). Finally, if the answer to question (2) is "yes," 
then how can we calculate the coefficient C,,<r? 

Assume that the answer to question (1) is "yes," and we can select i/'s^(0 and 
discretize s and t properly. Then, there must exist a function ij/: j^{t), defined as the 
dual function of xjjji^if), that can be used for reconstructing the signal x{t) described 
in question (1), as follows: 

x(r) = ^(x(r)>^.,(r)).AMW (4-6) 

where the term 4'jk{f) ^^^ ^^ obtained by performing the scaling and translation 
operations on i/'(f) as 

1 ~ /t~ kl' 



On the basis of the above assumption, if there exists another signal y{t), we can 
obtain the irmer product of the signals x{t) and y{t) as shown in (4.8). Note that the 
symbol * indicates the complex conjugate operator: 



4. 1 Discretization of Scale and Translation Parameters 5 1 



\ j.k I 



Y.(4t),h^-{t)){hAt)Mt)) 



j-k 



^E(^(0,iA,,.W)('A,>W,^W) 



(4.8) 



J,k 



Equation (4.8) implies that 

}'W-E(>'W''AmW)'A,>w (4.9) 

which means that the answer to question (2) is also positive. It further implies that 
the coefficient Cj,k can be calculated as 

q, = MO>>..W) (4.10) 

Therefore, once question (1) is answered, the answer to question (2) can be 
readily derived from it. The answer to the question (1) can be presented in 
mathematical terms as follows. 

If a set of wavelet coefficients {x{t),il/j(.{t)) exists that describes complete 
information of the signal x{t), then the following statements must hold: 

1. When xi{t) = X2{t), the inner product of Xi{t) and the scaled and translated 
wavelet i/'/^(0 can be expressed as 

(A-i(r),iA,.,(0) = (A-2(f)>y,,W> (4.11) 

2. For x{t) = 0, we have 

(A-(r),^,.,(r))-0 (4.12) 

3. When xi(f) is very close to X2{t), the corresponding wavelet coefficients 
{xi{t),\jj:i^{t)) must be close to (a2(?), i/';^,(f)). In other words, if \\xi{t) — X2{t)\\ 
is very small, then ^ |(A'i(r), i/^ ^.(?)) — (A-2(r), (/'.^.(O)! must be very small, too. 

j-k 

Mathematically, this can be expressed as 

E |(a-i(0>mW> - (A-2W,iA,.,,(r))|' < B\\x,{t) - X2{t)\\\ BeR+ (4.13) 

j,k 
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that is, 

El«0,'AM-W>f <5||a-W|P (4.14) 

In (4.13), the symbol R^ denotes the set of positive real numbers, and 5 is a 
positive real number. 

Furthermore, if we want to reconstruct x(t) from the wavelet coefficient 
{x{t), \jj:i.{t)), the following condition must hold: 

When {x\{t), \ijjit{t)) is very close \.o{x2{t), ipj i^{t)), xi{t) must be very close to 
X2{t), too, which leads to 

A\\x{t)f<J2\{x{t),^j,{t))\\ AeR+ (4.15) 

where A is a positive real number. 

Combining (4.15) with (4.14), we obtain the following equation: 

A\\xit)f<J2\{4t),^^^,it))f<B\\x{t)f, A,BeR+ (4.16) 

This ensures that the DWT of a signal x(t) can be obtained. Equation (4.16) 
is called a wavelet frame (Addison 2002). The values of the wavelet frame bounds, 
A and B, depending on both the scale parameter 5 and the translation parameter t 
that are chosen for analysis and the base wavelet function used (Daubechies 1992). 
Particularly, if A — B, the wavelet frame is known as a tight frame. In such a case, 
the signal x{t) can be reconstructed through the inverse discretized wavelet trans- 
form as 



-^W=l E E "^'(J^kWj^kit) (4.17) 

J——OC k——oo 

IfA^B, but the difference between A and B is not too large (Addison 2002), the 
signal x{t) can still be reconstructed as 

■^(0 = j^ E E '^'O' ^)'/'m(^) (4.18) 

The difference between x(t) and x'it) is determined by the values of A and B, 
and becomes small in practice when the ratio of B/A is approaching the value of one. 
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4.2 Multiresolution Analysis and Orthogonal 
Wavelet Transform 

Of the various forms of wavelet discretization, the dyadic discretization with ^o = 2 
and To = 1 has been widely used, as shown in (4.3). This is because it allows the 
selection of the base wavelet to be made in such a way that its corresponding family 
set ij/j i^{t) constitutes an orthogonal basis within the tight wavelet frame, character- 
ized by A = 1. To construct a base wavelet having the characteristics of orthogo- 
nality, the multiresolution analysis (MRA) is presented here as the theoretical 
foundation. 



4.2.1 Multiresolution Analysis 

The concept of MRA was formed when Mallat was working on image processing in 
the 1980s (Mallat 1989a, b). At that time, the idea of studying images simulta- 
neously at different scales had been popular for years already (Witkin 1983; Burt 
and Adelson 1983). This provided the background for using orthogonal wavelet 
bases as a tool to describe the information contained in the image, from coarse 
approximation to high-resolution approximation, and led to the formulation of 
MRA (Mallat 1989a, b). Theoretically, a MRA of the space L^{R) consists of a 
sequence of successive approximation subspaces {VjJ G Z} that satisfies the fol- 
lowing properties: 

1. Monotonicity, that is, • • • C ^2 C Vi C Vo C \^-i C V-2 C • • • (where the 
symbol C denotes a subset operator). This means that the subspace 
{Vj,i £ Z} holds the successive inclusion relationship. 

2. Completeness, that is, n V, = {0}; U Vj = L^{R), where n denotes the 

,/6Z jez 

intersect operator, and U denotes the union operator. This property indicates 

that all the subspaces together form a complete L^ (7?). 

3. Dilation regularity, that is, x{t) e Vj -^ x{2h) e Vo, where -^ denotes "if and 
only if," and e denotes "is an element of." The term / e Z indicates the 
multiresolution aspect of the subspaces {VjJ G Z}. 

4. Translation invariance, that is, x{t) e Vq ^ x{t ~ n) G Vo, for all « e Z 
(with ^ denotes "imply"). 

5. Existence of orthogonal basis: there exists a function </)(?) e Vo, whose 
corresponding closed subspaces {(/>(f — n)}„g2 form an orthogonal basis of the 
zero-scale space Vq', that is, J^ (/>(r — «)</>(? — m)dt — 3„,„. 

The function (f){t) is the scale function, whose translated version </>^(?) = (t>{t — k) 
satisfies the condition of ((/(^.(r), (/>^j(r)) = 5i^j^'{k,k' e Z). The zero-scale space Vq 
is composed of a set of closed subspaces, formed by <jf)^(f) and is denoted as 
Vo = span{(/)(r-^)}. 

k 
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Fig. 4.1 Inclusion 
relationship among closed 
subspaces {VjJ £ Z} 



Vo 3 V, 3 V, =)V3 




Fig. 4.2 Illustration of 
wavelet subspaces 



Wi 1W2IW3 iVj yo =^ Vi ^ y2 ^ Vj 




From the above description, we know that all the closed subspaces {VjJ G Z} 
are formed from the same scale function (/)(?) with different translation values, and 
the relationship among all the subspaces is illustrated in Fig. 4.1. It can be seen that 
the closed subspaces {VjJ G Z} hold the inclusion relationship, and they are not 
orthogonal. As a result, the scale function family 4>jii{t) = 2^^l^<f){2^h — k) does 
not hold the orthogonal property; that is, {4> j kif)} jez kez cannot be used as an 
orthogonal basis in L (R) space. 

To find the orthogonal bases in the L^{R) space, we can define Wj (j <E Z) as the 
orthogonal complement of V, in V,_i, as illustrated in Fig. 4.2. 

We can then write as follows: 



Vj-,=Vj®Wj 



(4.19) 



and 



where the symbol 
orthogonal operator. 



Wj±Wf, fox j^f (4.20) 

denotes direct summation operator, and _L denotes the 
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It follows that, for J < ,/, we can have the following relationship: 

V. = y,e-'grV,,_, (4.21) 

A-=0 

where all the subspaces Wj (j G Z) are orthogonal, and they form the L (R) space as 

L\r) = © Wi (4.22) 

Furthermore, the Wj spaces inherit the scaling property from the V, (Daubechies 
1992); that is, 

x{t) €Wo^ x{l-h) e Wj (4.23) 

Therefore, if {i/^y ^. ^ e Z} is a set of orthogonal bases in Wq space, then accord- 
ing to (4.23), for the scale / e Z, {ij/jj^ = 2"'/^i/^(2"-'V - k); ^ e Z} is a collection 
of orthogonal bases in the W, space. Accordingly, the entire collection of 
{4'jkl J & ^: k £ Z} forms the sets of orthogonal bases in L (R) space, and we 
call the function il/{t) the wavelet function, and the Wj space in (4.23) denotes the 
wavelet space in scale j. 



4.2.2 Orthogonal Wavelet Transform 

From the definition of the MRA, we know that 

VQ^Vi®Wi^V2®W2 + Wi^Vi®Wi®W2 + Wx = ... (4.24) 

Therefore, for a given signal x{t) e Vq, where Vq is defined as zero-scale space, we 
can decompose it into two parts (the detailed information in W\ and the approximate 
information in V\). The approximate information in Vi can then be further decom- 
posed to get the next level of detailed information in W2 and approximate information 
in V2, respectively. Such a decomposition process can be repeated until the designed 
scale y is reached. This, in a nutshell, is how a DWT is implemented. 

Mathematically, we can define x{(r) as the approximate information at scale 7 
after the signal x{t) is projected onto the Vj space: 

-^1(0 = E «M-'^*(2"'0 = E «v^a>,m(0 , k&Z (4.25) 

k k 

where 

«,,,= (a-(0,</'m(0) (4.26) 

are called the approximate coefficients. 
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Similarly, when the signal x(t) is projected onto the Wj space, the detailed 
information at scale / is obtained as 

-^d(0 = E djAk{2-^t) = E djAjA')' ^ e Z (4.27) 

k k 

where 

4-,= (x(f),^;,,(0> (4.28) 

are called detailed coefficients. 

Consequently, when a given signal x{t) e L^iR) is decomposed into the set of 
subspaces, 

J 

L^{R)^ E '^i®^-' (4.29) 

with / being any predetermined scale, we will have 

^(^) =il 11 dj,,xl,^,{t) + E au4>dt) (4.30) 

J——OC k——C)0 ^— — oo 

If/ -^ cxD, (4.30) can be simplified as 






^(') = E E '^J^'^hti') (4-31) 

Equation (4.31) is equivalent to (4.17) whsnA ^B ~ I. We know that in such cases 
the wavelet bases are orthogonal (Daubechies 1992). As a result, (4.30) and (4.31) 
express the inverse orthogonal wavelet transform, and (4. 26) and (4.28) express the 
orthogonal wavelet transform. From the above description, we see that the idea of 
orthogonal wavelet transform and MRA has followed the same path; thus, MRA 
provides the theoretical basis for orthogonal wavelet transform. 



4.3 Dual-Scale Equation and Multiresolution Filters 

The inherent relationship between the scale function </»(?) and wavelet function il/{t) 
can be expressed in a dual-scale equation as 

<^(r) = E^(")</'-i."W = V2Y,h{n)^{2t-n) (4.32) 
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m = E^(")'^-i."W = V2Y,g{n)4>{2t~n) (4.33) 

n n 

where 

r/2(n) = ((/), </._!„) 
[gin] = {^A-l,n) 

It should be noted that the dual-scale relationship only exists between two 
successive scales j and / — 1 ; that is, 

'^y.o«-E''(«)'^^-i.«W (4.35) 

n 
n 

Furthermore, the coefficients h(n) and g(n) will not change with the scale j. This 
can be proved as follows: 
Proof: 



JR 

=V2 I <t>{l'Wi2l' - n) rff'(let t' = l-Jf) (4-37) 

= ((/)(r),</._,„(0>=M«) 

Similarly, we can prove that (lA/olO) '/"/-i n(0) — 5(")- This means that the 
coefficients h{n) and g{n) are determined by the scaling function ^(t) and wavelet 
function i/'(?), respectively, and are not related to how we choose the scale /. 
Furthermore, if we perform an integral operation on both sides of (4.35), we obtain 
the following: 

/ (/),.„(r) dt = Y, h{n) I 4>j_,^„{t) dt (4.38) 

JR „ JR 



As 



>._^^^Xt)dt^2-- / (l){2-J+'t~n)dt 

R ' JR 



-7+1; 
''=2' rr f __/ , .__,■ , . 1 



V2 / 2-2 </, (2-V -n)ldt' 

JR 2 



71 //-^^^'^ 
71 //^'"^^^'^ 



(4.39) 
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substituting (4.39) in (4.38) leads to 

^/z(n) = V2 (4.40) 

n 

Similarly, we can perform an integral operation on both sides of (4.36) as 

/ ijyj^oit) dt = ^g(«) / (/),_,,„(0 dt (4.41) 

JR „ Jr 

Given that J^ il/{t) dt = 0, (4.41) can be simplified as 

Y^g{n)^0 (4.42) 



The coefficients h(n) and g{n) are called a pair of low-pass and high-pass 
wavelet filters, which are used to realize the DWT, on the basis of the Mallat 
algorithm, as described below. 



4.4 The Mallat Algorithm 

The dual-scale (4.32) can be rewritten as 

(/.(f) = Y^ h{n)V2(j){2t - n) (4.43) 

n 

Accordingly, the scaled and translation version of (f>{t) can then be expressed as 

0(2"'> ~k)^Y^ h{n)V2(i>{2{2-h ~ k) - n) 

I! 

= Y^ h{n)V2(l){2-J+^t - 2^ - n) 



(4.44) 



Let m = 2k + m; then (4.44) can be rewritten as 

(l){2-Jt -k)=Y Km - 2k)V2(l){2-J+^t - m) (4.45) 

n 

On the basis of the theory of MRA, we can define the following: 



Vj-x ^ span {2(-J+i)/2^(2->+if - k)} (4.46) 
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As a result, a given signal x{t) in the V, _ i space can be expressed as 

c(0 = J2 «,-i,i-2<-^'+''/'<^(2-''+'f - k) (4.47) 



' k 



If such a signal is projected (i.e., decomposed) onto the Vj and Wj spaces, the 
result can be expressed as 

x{t) = Y^ aj^k2''l^(l){2-h -k)+Y dj^k2'Jl^\lj{2-it - k) (4.48) 

k k 

where a, i and (iy i are calculated as 

«y,, - (x(r) , c/)^. ,(0> - / x{t)2-'l^* {2-h - k) dt (4.49) 



4,, = (x(r) , .A, A-(0) = / x(r)2-^'/2,A* (2-^> - k) dt (4.50) 

Substituting (4.45) in(4.49) results in 

aj^k ^Yh{m- 2k) f A-(r)2(--'+i'/20* (^2-J+h - m) dt 

= Yh{m-2k){x{t),4>j_iJ (4.51) 

m 

^Yh{m- 2^)aj_i,„, 

m 

Similarly, (4.50) can be further rewritten as 

dj^t = ^ g(m - 2k) f A-(r)2(--'+')/V* i^-J+^t - m) dt 

= Yh{m- 2k){x{t),^j_iJ (4.52) 

^y^^h{m-2k)dj-i^m 



This means that, through such a pair of filters, the signal x(t) is decomposed into 
low- and high-frequency components, respectively, as (Mallat 1998) 

aj,k = ^ h{m - 2k)aj^u„ 

™ (4.53) 

dj,k = 2^ ,?('« - 2k)aj^i^„, 
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^ 



AAAA 



4iif^ 
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Note: // - Low pass filter; G - High pass filter; A - Approximate information; D - Detailed information 

Fig. 4.3 Procedure of a four-level signal decomposition using discrete wavelet transform. Note: 
H low-pass tilter, G high-pass filter, A approximate information, D detailed information 



In (4.53), ajj; is the approximate coefficient, which represents the low- 
frequency component of the signal, and djj; is the detailed coefficient, which 
corresponds to the high-frequency component. The approximate coefficients 
at wavelet decomposition level j are obtained by convolving the approximate coeffi- 
cients at the previous decomposition level (/ — 1) with the low-pass filter coefficients. 
Similarly, the detailed coefficients at wavelet decomposition level j are obtained by 
convolving the approximate coefficients at the previous decomposition level (j — I) 
with the high-pass filter coefficients. Such a process represents the idea of Mallat's 
algorithm to implement the DWT, and is schematically shown in Fig. 4.3. 

From Fig. 4.3, we see that a signal is decomposed by a four-level DWT. After 
passing through the high-pass and low-pass filters on the first level (level 1), the 
output of the low-pass filter, denoted as the approximate coefficients of the level 1, 
is filtered again by the second-level filter banks. The process repeats itself, and at 
the end of the fourth level decomposition, the signal is decomposed into five feature 
groups: one group containing the lowest frequency components, denoted as the 
approximate information and labeled as AAA4, and four groups containing pro- 
gressively higher frequency components, called the detailed information and 
labeled as AAAD, AAD, AD, and D. The levels \-A correspond to the wavelet scales 
2^ = 2, 2^ = 4, 2^ = 8, and 2'* = 16, respectively. 



4,5 Commonly Used Base Wavelets 



This section introduces several commonly used orthogonal wavelets, which can be 
used as the basis for performing the DWT. 



4.5 Commonly Used Base Wavelets 
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Fig. 4.4 Haar wavelet {left) and its magnitude spectrum (right) 



4.5.1 Haar Wavelet 



The Haar wavelet is mathematically defined as (Haar 1910) 



l/'Haai(0 = 



0<r<i 
otherwise 



(4.54) 



Its function and magnitude spectrum are illustrated in Fig. 4.4. 

The Haar wavelet is orthogonal and symmetric in nature. The property of 
symmetry ensures that the Haar wavelet has linear phase characteristics, meaning 
that when a wavelet filtering operation is performed on a signal with this base 
wavelet, there will be no phase distortion in the filtered signal. Furthermore, it is the 
simplest base wavelet with the highest time resolution given by a compact support 
of one as shown in (4.54) (Daubechies 1992). However, the rectangular shape of the 
Haar wavelet determines its corresponding spectrum with slow decay characteris- 
tics, leading to a low frequency resolution. Examples of using the Haar wavelet for 
manufacturing related work include the stamping process monitoring (Zhou et al. 
2006) and fault detection in dry etching process (Kim et al. 2010). 



4.5.2 Daubechies Wavelet 



The family of the Daubechies wavelets is orthogonal, however, asymmetric, which 
introduces a large phase distortion. This means that it cannot be used in applications 
where a signal's phase information needs to be kept. It is also a compactly 
supported base wavelet with a given support width of 2N - 1 , in which N is the 
order of the base wavelet (Daubechies 1992). In theory. A' can be up to infinity. 
In real-world applications, the Daubechies wavelets with order up to 20 have been 
used. The Daubechies wavelets do not have explicit expression except for the one 
with A' = 1, which is actually the Haar wavelet as discussed above. With an increase 
of the support width (i.e., an increase of the base wavelet order), the Daubechies 
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Fig. 4.5 Daubectiies wavelet {left) and its magnitude spectrum (right), (a) Daubechies 2 base 
wavelet and (b) Daubechies 4 base wavelet 



wavelet becomes increasingly smoother, leading to better frequency localization. 
Accordingly, the magnitude spectra for each of the Daubechies wavelets decay 
quickly, as illustrated in Fig. 4.5, where the Daubechies 2 base wavelet and 
Daubechies 4 base wavelet are used as examples. 

The Daubechies wavelets have been widely investigated for fault diagnosis of 
bearings (Nikolaou and Antoniadis 2002; Lou and Loparo 2004) and automatic 
gears (Rafiee et al. 2010) 



4.5.3 Coiflet Wavelet 



The family of the Coiflet wavelets is orthogonal (Daubechies 1992), and near 
symmetric. This property of near symmetry leads to the near linear phase char- 
acteristics of the Coiflet wavelet. They are designed to yield the highest number of 
vanishing moments (2A0 for both the base wavelet of the order A' and the scaling 
function, for a given support width of 6N - 1. Figure 4.6 illustrates the sample 
waveforms of the Coiflet wavelets, with their corresponding magnitude spectra at 
orders 2 and 4, respectively. The Coiflet wavelet has been used for fault diagnosis of 
rolling bearings (Sugumaran and Ramachandran 2009). 
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Fig. 4.6 Coiflet wavelet {left) and its magnitude spectrum (right), (a) Coiflet 2 base wavelet and 
(b) Coiflet 4 base wavelet 

4.5.4 Symlet Wavelet 

Symlet wavelets(Daubechies 1992) are orthogonal and near symmetric. This 
property ensures minimal phase distortion. A Symlet wavelet of order A' has the 
number of vanishing moments N for a given support width of 2N - 1 . They are 
similar to the Daubechies wavelet, except for better symmetry. Waveforms with 
their corresponding magnitude spectra for the Symlet wavelet at orders 2 and 4 are 
illustrated in Fig. 4.7a, b, respectively. Examples of using the Symlet wavelet 
for signal decomposition in manufacturing-related problems include characteriza- 
tion of fabric texture (Shakher et al. 2004) and health monitoring of rolling bearings 
(Gao and Yan 2006). 



4.5.5 Biorthogonal and Reverse Biorthogonal Wavelets 



The family of biorthogonal and reverse biorthogonal wavelets (Daubechies 1992) is 
biorthogonal and symmetric. The property of symmetry ensures that they have 
linear phase characteristics. This type of base wavelet can be constructed by the 
spline method (Cohen et al. 1992). Figures 4.8 and 4.9 illustrate sample waveforms 
with their magnitude spectrum for several biorthogonal and reverse biorthogonal 
wavelets, respectively. In practice, this group of wavelets has been used for surface 
profile filtering in manufacturing process monitoring and diagnostics (Fu et al. 2003). 
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Fig. 4.7 Symlet wavelet (left) and its magnitude spectrum (right), (a) Symlet 2 base wavelet and 
(b) Symlet 4 base wavelet 
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Fig. 4.8 Biorthogonal 2.4 wavelet (left) and its magnitude spectrum (right) 
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Fig. 4.9 Reverse biorthogonal 2.4 wavelet (left) and its magnitude spectrum (right) 



4.6 Application of Discrete Wavelet Transform 
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Fig. 4.10 Meyer wavelet (left) and its magnitude spectrum (right) 
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4.5.6 Meyer Wavelet 

The Meyer wavelet is orthogonal and symmetric. However, it does not have a finite 
support. The Meyer wavelet has explicit expression and is defined in the frequency 
domain as follows: 



W, 



Meyer 



if) 



Ine'^'f sin[fv(3|/| - 1)] 
2n e'^'f 



cos[|v(||/|-l)] 







\<\f\<\ 
f<l/l<f 



(4.55) 



where v() is an auxiliary function, expressed as 

v(a) = a'* (35 - 84a + 70a^ - 20a^), a e (0, 1) 



(4.56) 



The Meyer wavelet with its magnitude spectrum is illustrated in Fig. 4.10. 

Typical applications of Meyer wavelet in manufacturing-related problems 
include signal denoising and bearing fault diagnosis (Abbasion et al. 2007). 



4,6 Application of Discrete Wavelet Transform 

One of the most popular applications of the DWT is to remove noise contained in a 
signal. This is based on the observation that a signal's energy is often distributed 
over a few wavelet coefficients with high magnitude, while energy of the noise is 
distributed across most of the wavelet coefficients with low magnitude. A thresh- 
olding scheme can therefore be devised to remove the noise. Mathematically, 
assume a signal with noise contamination expressed as 



y{t) = x{t) + (je{t) 



(4.57) 
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where x(t) is the signal, e(t) is a Gaussian white noise A'(0,1), and a represents the 
noise level. The objective of denoising is to suppress the noise e(t) and to recover 
the signal x(t). Generally, the denoising procedure consists of three steps: 

1. Signal decomposition: Choosing a base wavelet and a decomposition level 
J, and then performing DWT up to level / on the signal. 

2. Detailed coefficients thresholding: For each decomposition level from 1 to /, 
selecting a threshold and applying it to the detailed coefficients. 

3. Signal reconstruction: Performing wavelet reconstruction to obtain denoised 
signal, based on the original approximate coefficients of level / and the modified 
detailed coefficients of levels 1 to J. 

It should be noted that two thresholding approaches (hard thresholding and soft 
thresholding) can be used in the denoising process (Donoho 1995; Donoho and 
Johnstone 1995). Hard thresholding can be described as the process of setting the 
value of the detailed coefficient djj; to zero, if its absolute value is lower than the 
threshold (denoted as thr). This is mathematically expressed as 



_ I dj^t dj,k > thr 
"■'■•* " 1 dj,k < thr 



(4.58) 



Soft thresholding can be considered as an extension of the hard thresholding, as 
shown in Fig. 4. 1 1 . It sets those detailed coefficients to zero if their absolute values 
are lower than the threshold, and then shrinks the nonzero coefficients toward zero. 
Mathematically, this can be expressed as 
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Fig. 4.11 Illustration of hard thresholding and soft thresholding 
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aj.k 



_ { iign{dj^t){\dj.k\ -thr) 



dj,k ^ thr 
dj,k < thr 



(4.59) 



where 



sgn(4,A- 



dj,k > 

djM<0 



(4.60) 



As an example, Fig. 4.12a shows a "blocks" test signal, and Fig. 4.12b shows 
that it is contaminated by a Gaussian white noise to make a signal-to-noise ratio 
of 4. The signal is decomposed up to level 3, with sym8 wavelet being the base 
wavelet. After performing soft thresholding to the detailed coefficients at each 
decomposition level, the signal is reconstructed as shown in Fig. 4.12c. As only a 
small number of large coefficients characterize the original "blocks" signal, this 
DWT-based denoising method performs well. 



Fig. 4.12 Example of 
discrete wavelet transform for 
denoising. (a) A "blocks" 
signal, (b) a "blocks" signal 
contaminated by Gaussian 
white noise, and (c) denoised 
signal using discrete wavelet 
transform 
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4.7 Summary 

This chapter begins with a description of the discretization of the scale and transla- 
tion parameters. The MRA and orthogonal wavelet transform are then introduced in 
Sect. 4.2. After that, we describe in Sect. 4.3 the dual-scale equation and its 
associated wavelet filter pair. The Mallat algorithm for implementing the DWT is 
then discussed in Sect. 4.4, followed by the introduction of some commonly used 
wavelets in Sect. 4.5. Some typical applications of the DWT are shown in Sect. 4.6. 
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Chapter 5 

Wavelet Packet Transform 



While discrete wavelet transform provides flexible time-frequency resolution, it 
suffers from a relatively low resolution in the high-frequency region. This defi- 
ciency leads to difficulty in differentiating high-frequency transient components. 
The wavelet packet transform (WPT), in comparison, further decomposes the 
detailed information of the signal in the high-frequency region, thereby overcoming 
this limitation. Figure 5.1 schematically illustrates a WPT-based signal decompo- 
sition process, where a four-level WPT produces a total of 16 subbands, with each 
subband covering one-sixteenth of the signal frequency spectrum (Gao and Yan 
2006). The enhanced signal decomposition capability makes WPT an attractive tool 
for detecting and differentiating transient elements with high-frequency character- 
istics. 

In this chapter, we introduce the theoretical basis of a wavelet packet and 
algorithms to realize the WPT. Representative applications of the WPT are then 
introduced to illustrate this computational technique. 



5.1 Theoretical Basis of Wavelet Packet 
5.1.1 Definition 

The wavelet packet is defined by the following equation (Wickerhauser 1991): 

««W = V2^H^)««(2f-fc) 

* with« = 0,l,2,...and/S:=0,l,...,m (5.1) 

k 

with Mq (f) being the scaling function 4>{t), that is, Uq '{t) — 4>{t), and u\ '{t) being 
the base wavelet function i/'(?), that is, u\ (t) = \j/{t) (Wickerhauser 1991). The 
superscript (/) in (5.1) denotes the^'th level wavelet packet basis, and there will be 2' 
wavelet packet bases at the jih level. 
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Fig. 5.1 Procedure for signal decomposition using wavelet packet transform. Note: A approxi- 
mate information, D detailed information, H low-pass filter, G high-pass filter 



To illustrate the derivation process of wavelet packet basis, the Haar wavelet 
(Haar 1910) is used here as an example. The coefficients h{k) and g{k) for Haar 
wavelet are defined as (Daubechies 1992) 



/7(0) =h{\) = —=, h{k)=0 whsnk = 2,3,...,m 

\/2 

g{0)=g{l)^-^, g{k)=0 when /t = 2,3,.... 



(5.2) 



From (5.1) and (5.2), the first level of the Haar wavelet packet basis, indicated by 
the superscript (1) is obtained as 



M)tA-JO) 



u^^\t) = V2^[«("(2r) - «(')(2r - 1)] = u^i\2t) - 4"(2r- 1) 



(5.3) 



Similarly, the second and third levels of the Haar wavelet packet basis can be 
derived using (5.4) and (5.5), respectively: 



f «(''(r) = </.(40 
uf {t) ^ u^^\2t) - u^\2t - \) 

42)(f)^„(2)(2r) + „(2)(2r-i) 
l^ M^ (?) = Mf' (2r)-Mf' (2?-l) 



(5.4) 
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u'i}{t)^u^^\2t) + u^^\lt-l), «=1,2,3 
I 4'Vi(0 = 4'H20 - «r'(2?" 1), « = 0, 1,2,3 



(5.5) 



Figure 5.2a-c illustrates the waveforms of the Haar wavelet packet bases at 
levels 1 through 3 that are derived from the scaling function. Using the same 
approach, the Haar wavelet packet bases at all other levels can be obtained. 
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Fig. 5.2 Wavelet packet bases for the Haar wavelet: (a) level 1; (b) level 2; and (c) level 3 
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5.1.2 Wavelet Packet Property 

Equation (5.1) indicates that the wavelet packet has the following properties 
(Wickerhauser 1991; Coifman et al. 1992). 

5.1.2.1 Shift Orthogonality 

If {un (OInez i^ '^he set of wavelet packet bases obtained from the scaling function 
m[) (f) = (j){t) of an orthogonal base wavelet, then these bases hold the property of 
shift orthogonality: 

«(f), <(?-^))=<5,, k^z (5.6) 

where (•) denotes inner product operation. The symbol (5a represents a Dirac 
function. 

Proof: When n = 0, Mq (f), and uj (?) are the scaled versions of (/>(?) and i/'(f), 
respectively. By definition of the scaling function and base wavelet function, they 
are orthogonal (Daubechies 1992). 

When « ^ 0, as uf [t) and m^ (f) are both a linear combination of uy (t) as seen 
in (5.3) and (5.4), and iq (t) is a scaled version of the wavelet function i/'(f), which 
is orthogonal and normalized (Daubechies 1992), M2 (0 ^'^'^ "3 (^) ^^^ orthogonal. 

As an example, if we have 

*' ,-. (5-7) 

i/^^ {t^k)^V2j2 hk"uf (2f -2k- k") 



where ^' = 0, 1 , . . . , m and ^' = 0, 1 , . . . , m. 
Then 



{uf (r) , u{^ {t-k))=2Y^Y^ h'h', {u{^ (2r - k') , m*-''' [It -2k- k")) (5.8) 



k' k" 



The inner product on the right-hand side of (5.8) is equal to 1/2 when 
k' = 2k + k"; otherwise, it is equal to zero. Therefore, 

(4'' (?) , u^'> {t-k))=J2 hk"h2k+k" = h (5.9) 

k' 

Similarly, u^ {t) and Mj (0 ^""e both linear combinations of Mj (?), and they are 
also orthogonal. Using the same approach, wavelet packet bases of higher levels 
can be derived. 
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5.1.2.2 Orthogonal Relationship between h^^, (f ) and u'2n+i {t) 

(«2]W, «E!+iW)-0 (5.10) 

Proof From (5.1), we have 

(«2:!W, uil.it)) =2 j Y,Y.'''^,g,,u^X^t-^k-k')u'J\2t-k")dt 

J U Ul 

. (5.11) 

= 2 E E ^k'gk" / «!/■' {2t ~2k~~ k')u^^ {2t ~ k") dt 



k' k" 



The result of integral part in (5.11) is equal to zero except when k" = 2k + k' . 
Therefore, 

{^in (0 , "aLi (0) = E ^'^'Sk" = (5.12) 

k" 



5.2 Recursive Algorithm 

Once the wavelet packet basis is defined using (5.1), a recursive algorithm can be 
designed to implement WPT for signal decomposition. The result of the decompo- 
sition is given by (Mallat 1999): 

"' (5.13) 

4+i,2"+i = 2^S[m - 2k)dj^„ 



where fi(,,„ denotes the wavelet coefficients at the /th level, nth subband, dj^-^^m and 
dj+i^2n+\ denote the wavelet coefficients at the (/+l)th level, 2«th, and (2«+l)th 
subbands, respectively, and m is the number of the wavelet coefficients. 

Theoretically, there are multiple ways (>2^) to analyze a signal using an L-level 
decomposition (Mallat 1999). This makes it possible to optimize the signal decom- 
position process and improve the effectiveness. Various criteria, such as lp{p < 2) 
norm, logarithmic entropy, and Shannon entropy, can be utilized as the cost 
function to facilitate the optimization process. A widely applied criterion for 
optimal WPT-based signal representation is the Shannon entropy (Coifman and 
Wickerhauser 1992). For wavelet coefficients at the «th subfrequency band within 
the level /, djj, — {dj„ : n — 1,2,..., 2-'}, the Shannon entropy is defined as 

Entropy {dj^„) = - ^ /?,■ . log(/7,) (5.14) 
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where p, is the probability distribution of the energy contained in the wavelet 
coefficients at the «th subfrequency band within the level /. The probability 
distribution function is defined as 

Pi^\dj,,{i)\^l\\dj,f (5.15) 

with X^'iiP' — 1' ^^^ Pi ' ^^ZiPi — if Pi = 0- The upper limit m represents the 
number of wavelet coefficients at the «th subfrequency band within the level j. 

Equations (5.13) and (5.14) indicate that the entropy of the wavelet coefficients 
is bounded by 

< £entropy(4,«) < l^gl'" (5-16) 

From (5.16), we see that the Shannon entropy will have a large value if the 
energy content is spread out across the constituent wavelet coefficients within the 
subfrequency band. Conversely, it assumes a small value if the energy is concen- 
trated on a few dominant components. As we want the signal information to be 
concentrated within as few coefficients as possible, the minimum Shannon entropy 
should be contained in the wavelet coefficients as a result of the signal decomposi- 
tion. Mathematically, such a process involves comparing the entropy of the lower 
level (e.g., in the subbands Z)A4A and DAAD , Fig. 5.1) of the tree structure with the 
entropy of the higher source level (e.g., subband DAA), starting from the bottom of 
the decomposition (e.g., level 4). If the higher level has returned smaller entropy 
than the sum of the entropies from the lower level, then the higher level subfre- 
quency band will be retained. Otherwise, it will be replaced by the two subfre- 
quency bands at the lower level. Such a process is executed until it reaches the top 
level of the decomposition. 



5.3 FFT-Based Harmonic Wavelet Packet Transform 

Besides the recursive algorithm introduced in the previous section, another algo- 
rithm for WPT, based on the Fourier transform, has been shown to be effective 
when realizing the harmonic wavelet packet transform (HWPT) (Samuel et al. 
2000; Yan and Gao 2005). 



5.3.1 Harmonic Wavelet Transform 

The mathematical expression of the harmonic wavelet is defined in Chap. 3 as 

1 

m <f <n 



'Pnu,if)^ { («-«) ■ (5.17) 

elsewhere 
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Accordingly, its corresponding time domain expression is obtained by taking the 
inverse Fourier transform as (Yan and Gao 2005) 

pjnlnt ^jmlnt 

^,..,M=^^, ^ (5.18) 

jlnyn — m)t 

If the harmonic wavelet is translated by a step kl{m — «), in which k is the 
translation parameter, a generalized expression that is centered at f = k/{n — m) 
with a bandwidth of (« — m) can be obtained as (Newland 1994) 

/ k \ {e'''^<'-^) - e''"^<'-^A 

^nJt =^w u^ — rr-^ (5.19) 

On the basis of the generalized expression, the harmonic wavelet transform of a 
signal x{t) can be performed as 

hwt(m,n,k) = {ii ~ m) / x(t)i//*,„(t ) dr (5.20) 



where hwt{m, n,k) is the harmonic wavelet coefficient. 

By taking the Fourier transform of (5.20), an equivalent expression of the 
harmonic wavelet transform in the frequency domain can be expressed as 

HWTim, n,f) = X(f) ■ <?*((« - m)f) (5.21) 

where X(f) is the Fourier transform of the signal x{t), and f *((« — m)f) is the 
conjugate of f ((« — m)f), which is the Fourier transform of the harmonic wavelet 
at the scale (m, n). As the harmonic wavelet has compact frequency expression as 
shown in (5.17), the harmonic wavelet transform can be readily obtained through a 
pair of Fourier transform and inverse Fourier transform operations (Newland 1993). 
As shown in Fig. 5.3, after taking the Fourier transform of a signal x{t) to obtain 
its frequency domain expression X{f), the inner product HWT{m, n,f) of X{f) and 
the conjugate of the harmonic wavelet W*{{n — m)f) at the scale (m, n) are 
calculated. Finally, the harmonic wavelet transform of the signal x(f), denoted as 
hwt{m, n,k), is obtained by taking the inverse Fourier transform of the inner product 
HWT{m,n,f). 



5.3.2 Harmonic Wavelet Packet Algorithm 

The scale parameters m and n determine the bandwidth that the harmonic wavelet 
covers. Shown in Fig. 5.4a-d are the real and imaginary parts of the generalized 
harmonic wavelet under two exemplary sets of scale parameters, m — 0,n— 16 and 
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Fig. 5.3 Algorithm for 
implementing the harmonic 
wavelet transform 
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m= 16, n = 32, while the translation parameter ^ = 8 remains the same. We can see 
that, through appropriate variation of these two scale parameters, the harmonic 
wavelet can be scaled to match the signal within different frequency regions 
associated with the same bandwidth of (« — m) (16 in this example), as shown in 
Fig. 5.4e-f. As a result, the HWPT operation is realized. 

Similar to the WPT, the number of frequency subbands for the HWPT has to be 
s powers of two, in which s corresponds to the decomposition level of WPT. As a 
result, the signal can be decomposed into 2" frequency subbands, with the band- 
width expressed in Hertz for each subband that is defined by 



/band 



/h 



(5.22) 



In (5.20),/h is the highest frequency component of the signal to be analyzed. As 
the bandwidth of the harmonic wavelet is {n — m), selection of the values for m and 
n of the HWPT has to satisfy the following condition: 

m-n=fi,^nd (5.23) 

Thus, the harmonic wavelet packet coefficients hwpt{s, i, k) can be obtained as 

hwpt{s, i, k) — hwt{m, n, k) (5.24) 

where s is the decomposition level, / is the index of the subband, and k is the index 
of the coefficient. In addition, the parameters m and n need to satisfy the following 
condition: 
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Fig. 5.4 Waveforms of the liarmonic wavelet with their Fourier transforms under different scale 
parameters 



m = i X /band = / X ■ 



« = (/+ 1) X/i 



band 






/==0, 1,...,2'- 1 



(5.25) 



As a result, by selecting the appropriate parameter pairs {m, n) based on (5.25), 
the FFT-based HWPT algorithm can be realized through the computational process 
as illustrated in Fig. 5.3. 
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5.4 Application of Wavelet Packet Transform 

Using the WPT, we can determine a signal's time-frequency composition, thereby 
having a good understanding of what is contained within the signal. Furthermore, 
the WPT can be appHed to remove noise contained in the signal. In the following, 
we demonstrate two examples of these applications. 



5.4.1 Time-Frequency Analysis 

Figure 5.5 shows a vibration signal measured on a ball bearing during a run- 
to-failure test. Physically, when a localized defect is initiated in a rolling element 
bearing, for example, due to spalling on the surface of the bearing raceway, impact 
will be generated every time when a rolling element rolls over the defect. Such 
impacts subsequently excite the intrinsic modes of the bearing system, giving rise to 
transient vibrations at the mode-related resonant frequencies. As the defect size 
increases, different intrinsic modes of the bearing system will be excited, leading to 
frequency shifts of the impact-induced transient vibrations. Therefore, by evaluat- 
ing the time-frequency distributions of the vibration signal, degradation of the 
bearing's health condition can be monitored. 

Applying the WPT to the vibration data, we have seen in Fig. 5.6 that not only all 
the major transient elements are identified, but the corresponding frequency shifts 
are also clearly seen. The result also shows the increased number of frequency 
components after the 45-ms time point, reflecting the defect size propagation. 
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Fig. 5.5 Vibration signal from a ball bearing 
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Fig. 5.6 Wavelet packet transform of the bearing vibration signal 



5.4.2 Wavelet Packet for Denoising 

Figure 5.7a shows a noisy chirp signal, where Gaussian noise is added to the signal, 
leading to a signal-to-noise ratio of seven. The denoising idea illustrated here is in 
principle identical to that developed in the wavelet framework in Chap. 4. The only 
difference is that the WPT provides better flexibility because of a more complete 
analysis of the signal. In this example, the Stein's unbiased estimate of risk (SURE) 
criterion threshold is used to construct the wavelet coefficients (Donoho 1995; 
Donoho and Johnstone 1995). For the purpose of comparison, the signal is pro- 
cessed using both the wavelet packets-based denoising and wavelet-based denois- 
ing techniques, and the results are shown in Fig. 5.7b, c, respectively. It can be seen 
that the performance of the wavelet packets-based denoising approach is better than 
that of the wavelet-based approach. 



5.5 Summary 



This chapter begins with the introduction of a theoretical basis of a wavelet packet, 
where the definition of the wavelet packet and its related properties are presented. 
Two approaches for implementing the WPT are then discussed. Applications of the 
WPT on time-frequency analysis and denoising are illustrated in Sect. 5.4. 
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Fig. 5.7 Example of wavelet 
packet for denoising. (a) A 
noisy chirp signal, (b) 
denoised signal using wavelet 
packet, and (c) denoised 
signal using wavelet 
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Chapter 6 

Wavelet-Based Multiscale Enveloping 



The use of enveloping technique has been found in many engineering fields. 
For example, enveloping is employed for the detection of ultrasonic signals, as 
seen in nondestructive testing (McGonnagle 1966; Greguss 1980; Liang et al 2006). 
It also presents a complementary tool to spectral analysis in detecting structural 
defects in rolling bearings (e.g., surface spalling) and gearbox (e.g., broken teeth) 
(Tse et al 2001; Wang 2001). Generally, three steps are involved in envelope 
extraction, as illustrated in Fig. 6.1. First, the measured signal passes through a 
band-pass filter with its bandwidth covering the high-frequency components of 
interest. As a result, the rest of the frequency components outside of the passing 
band are rejected, leaving only bursts of the band-passed components in the signal, 
as shown in Fig. 6.1b. Next, the band-passed signal is rectified, and shown in 
Fig. 6.1c. Finally, the rectified signal passes through a low-pass filter that is 
designed to allow only the low-frequency envelope of the signal to pass through, 
as shown in Fig. 6. Id. 

A limitation when applying this technique is that it requires a proper filtering 
band to be chosen to accurately extract the signal's envelope, for which a priori 
knowledge of the signal is desired. In this chapter, an adaptive, multiscale envelop- 
ing technique based on the wavelet transform is introduced, which overcomes the 
limitation of the conventional enveloping technique. 



6.1 Signal Enveloping Through Hilbert Transform 

The Hilbert transform has shown to present a good alternative to the conventional 
enveloping technique in extracting a signal's envelope (Hahn 1996). Mathemati- 
cally, the Hilbert transform of a real-valued signal is defined as 

i(0=H[x(r)]= / -^r^dr (6.1) 
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Fig. 6.1 Procedure for 
traditional envelope 
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where H[] denotes the Hilbert transform operator. The symbol x{t) represents the 
Hilbert transform resuh of a real-valued signal x{t), and is the convolution of x(f) 
and (l/nt): 



x{t) = x{t) 



nt 



(6.2) 



where the symbol ® denotes the "convolution" operation. According to the 
convolution theorem, the Fourier transform of the convolution of two signals is 
the product of the respective Fourier transforms of the two signals (Oppenheim 
et al. 1999). Accordingly, the Fourier transform of A-(r) can be expressed as 



X{f)=X{f)xF 



(6.3) 



where the symbol x denotes the "product" operation, X{f) is the Fourier trans- 
form of the signal x{t), and F[\/nt\ denotes the Fourier transform of the term \/nt. 
Specifically, this is defined as 
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11 i-J f>^ 

- ^-jsgnf={ / = (6.4) 

-^J [j /<0 

Combining (6.4) with (6.3) yields 

r ~jx{f) />o 

m=l / = (6.5) 

[ jX{f) f<0 

Through an inverse Fourier transform performed on (6.5), the Hilbert transform 
of the real-valued signal can be realized. Accordingly, a special type of complex- 
valued signal z{t) can now be formulated as 

z(f)=A-(r)+,/i(r) (6.6) 

with the real-valued signak(f) being its real part, and the Hilbert transform of the 
signal, x{t), being the imaginary part. Because of the inherent linearity property of 
the Fourier transform, the corresponding expression of (6.6) in the frequency 
domain can then be given as 

Z(f)^X(f)+jX(f) (6.7) 

Combining (6.7) with (6.5) yields 



~jX{f) />0 r 2X(f) />0 

Z(f)=X(f)+j^ f = o=lxio)f = o. 
jX{f) /<0 [ /<0 



(6.8) 



Equations (6.6) and (6.8) indicate that the complex-valued signal z(f) is analytic 
in nature (Lawrence 1999). This means that it can also be expressed in terms of the 
complex polar coordinates as 

z(0 = a{t) £•'■"('' (6.9) 

where 



a{t) = \/x{tf +x{tf 



(6.10) 



0(0 = tan-' (-^) (6.11) 



,-1 1%') 
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Equations (6.10) and (6.11) are called amplitude envelope function and 
instantaneous phase function of the signal x{t), respectively. This indicates that 
performing the Hilbert transfomi on a real-valued signal x{t), results in the formu- 
lation of a corresponding analytic signal z{t), from which the envelope a{t) of the 
signal can be extracted. Such a property of the Hilbert transform makes it well 
suited for enveloping, as described in the following section. 



6.2 Multiscale Enveloping Using Complex- Valued Wavelet 

Among various base wavelets commonly used for signal analysis (Lee and Tang 
1999; Yen and Lin 2000; Yoshida et al. 2000; Prabhakar et al. 2002; Yan and Gao 
2005a), the complex-valued wavelets have the property of being analytic in nature. 
Such wavelets are generally defined as 

iA(0 = .A«(0 +#/(/) - i^^it) +77/[^«(f)] (6.12) 

where i/'s(/) and (/'/(?) represent the real and the imaginary parts of the complex- 
valued wavelet, respectively, and il/j{t) is the Hilbert transform of \l/f>{t). 

The wavelet transform wt^-^s, t) of a signal x{t) using complex-valued wavelet is 
expressed as 

wt^{s,x) = wr/e(i,T) +jwti{s,t) = wtR{s,t) +jH[wtR{s,x)] (6.13) 

where H'r«(.?, t) and wti{s, t) are the real and imaginary parts of the transformation 
results, respectively. They are defined as 






wt,{s,x)=Yi[wtR{s,x)]^\s\-'l^ 



\dt 



A-(f)H 



lAs 



(6.14) 

r — T\i 

dt 



Equations (6.13) and (6.14) indicate that the results of wavelet transform 
wfc(i, t) of a signal x{t) using the complex-valued wavelet is also analytic. As a 
result, the signal's envelope at scale s, env„,,(^, t), can be readily calculated from the 
modulus of the wavelet coefficients as 



envy,,,{s,x) = \\wtc{s,x)\\ = \J wtR{s,xf + H[wtR{s,x)f (6.15) 

As the wavelet transform itself can be considered as a series of band-pass 
filtering operations (implemented through the scaled parameter s) as described in 
Chap. 3, and the signal's envelope can be obtained by calculating the modulus of 
the wavelet coefficients when the complex-valued wavelet is used, a multiscale 
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Fig. 6.2 Illustration of the multiscale enveloping algorithm 

enveloping technique can be developed on the basis of the wavelet transform. 
Computationally, this technique first decomposes the signal (e.g., vibrations 
measured on a defective rolling bearing) into different wavelet scales by means 
of a complex-valued wavelet transform, as illustrated in Fig. 6.2a. A series of 
wavelet coefficients, which are expressed as real part and imagineiry part, respec- 
tively, are then obtained (Fig. 6.2b). The envelope signal in each scale (Fig. 6.2c) is 
finally calculated from the modulus of the wavelet coefficients. 



6.3 Application of Multiscale Enveloping 

This section describes the application of the multiscale enveloping technique 
introduced above to two different mechanical systems. 



6.3.1 Ultrasonic Pulse Differentiation for Pressure Measurement 
in Injection Molding 



Online monitoring and control of pressure in the cavity of an injection machine has 
been shown to be critically important for improving product quality while main- 
taining low rejection rates in injection molding (Rawabdeh and Petersen 1999). The 
design of a self-powered wireless sensor has enabled the placement of multiple 
sensors within a mold to achieve comprehensive spatial coverage of the cavity 
pressure profile (Gao et al. 2001; Theurer et al. 2001). To overcome electromag- 
netic shielding caused by the steel mold that surrounds the sensors, ultrasonic wave 
has been explored as an alternative to electromagnetic wave for pressure data 
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transmission out of the mold (Zhang et al. 2004). Specifically, mold cavity pressure 
measured by a piezoceramic sensing element is digitized into a series of ultrasonic 
pulse trains, with each pulse train representing the crossing of a preset pressure 
threshold. The actual cavity pressure (denoted as O in Fig. 6.3a) is reconstructed by 
multiplying the total number of the pulse trains (denoted as © in Fig. 6.3a) with the 
known threshold value. Given a matrix arrangement of such wireless sensors within 
the mold cavity, spatial coverage of the cavity pressure profile can be obtained. An 
example of a sensor matrix consisting of six wireless sensors and a single receiver is 
illustrated in Fig. 6.3b. 




■ ;^ J 1 

Time (ms) 

Sefiiiing principle: 9 actual pressure, © measured pressure, and 
® ultrasonic pulse trains 




Receiver 
lUustralton of the sensor array configuration 

Fig. 6.3 Sensing principle and the sensor array arrangement in an injection mold, (a) Sensing 
principle: O actual pressure, measured pressure, and © ultrasonic pulse trains and (b) illustration 
of the sensor array configuration 
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The limitation of such a ID enveloping technique is illustrated in Fig. 6.4, which 
illustrates a total of six ultrasonic pulse trains generated by six transmitters, with the 
center frequencies being 2,210, 2,480, 2,785, 3,140, 3,530, and 3,980 kHz, respec- 
tively. Each pulse train is related to the crossing of the melt pressure of a threshold 
level at a specific location along the cavity. The envelope of the signal is given in 
Fig. 6.4b. By thresholding the enveloped signal, the time of arrival of each pulse 
train can be determined. However, as the difference in frequency of the pulse trains 
cannot be accurately resolved, the spectrum of the multiple pulse trains appears in 
Fig. 6.4c as a lumped group, giving the appearance as if they were generated by a 
single transmitter. 

Such a problem can be solved by using the multiscale enveloping technique 
introduced in this chapter, which decomposes the pulse trains into individual 
frequency subbands and extracts the respective envelope from the pulse trains in 
each subband. Multiplying the number of crossings by the envelope with each 
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Fig. 6.4 Limitation of ID enveloping technique, (a) Original signal consisting of six ultrasonic 
pulse trains of different center frequencies, (b) envelope of the original signal, and (c) spectrum of 
the original signal 
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respective threshold value, the cavity pressure profile can be reconstructed. This is 
illustrated in the following sections, through both simulation and experiments. 



6.3.1.1 Simulation 

The performance of the multiscale enveloping for pulse detection on a sensor 
matrix consisting of six spatially distributed ultrasonic transmitters is evaluated 
first by means of a computer simulation. The six spectrally adjacent ultrasonic pulse 
trains are centered at 2,210, 2,480, 2,785, 3,140, 3,530, and 3,980 kHz, respectively, 
labeled as (T) through (6) in Fig. 6.5. The pulses are separated by an interval of 10 |is 
from one another, simulating the flow of polymer melt over the sensor matrix 
sequentially at a constant speed. As shown in Fig. 6.5, the six pulses could be 
detected and well separated into six levels (each level corresponds to a specific 
scale calculated on the basis of ultrasonic pulse center frequency) through a 
wavelet-based multiscale enveloping process. 

In another simulation, the multiscale enveloping technique is applied to decom- 
posing an ultrasonic signal consisting of two different types of ultrasound pulse 
trains: 

1. spectrally identical (with the same center frequency of 3,980 kHz) and timely 
adjacent (5 |j,s apart), as labeled (T) and (2) in Fig. 6.6a 

2. timely overlapped and spectrally adjacent (with center frequencies of 2,210 and 
2,785 kHz, respectively) as labeled (3) and (4) in Fig. 6.6a. 

As shown in Fig. 6.6b, pulses (T) and (2) are successfully differentiated both 
spectrally (at the same level 6, because of their identical center frequency) and 
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Fig. 6.5 Detection and differentiation of six spectrally adjacent pulse trains 
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Fig. 6.6 Detection and differentiation of two pulse trains that are timely overlapped and spectrally 
adjacent and two pulse trains that are timely adjacent but spectrally identical 

temporally (successive along the time axis, with 5 [is separation). Similarly, the two 
pulses (3) and (4) are well separated into the first and third levels, reflecting on the 
different center frequencies that they contain. 



6.3.1.2 Experimental Study 

To experimentally verify the performance of the developed multiscale enveloping 
technique for ultrasonic pulse detection, three ultrasonic transmitters were designed 
and prototyped, with the center frequencies being 2,480, 2,785, and 3,140 kHz, 
respectively. An electrical pulser (model C-lOl-HV from PAC company) was used 
to electrically excite the transmitters. Ultrasonic pulses generated were then trans- 
mitted through a steel block of 6 cm thickness, which represents a realistic injection 
mold. The pulses were received by an ultrasonic receiver located on the opposite 
side of the steel block. The received ultrasonic pulses were measured and recorded 
using a digital oscilloscope (model TDS 3012B from Tektronix). 

In the first experiment, a single transmitter (center frequency 3,140 kHz) was 
excited repetitively at 10 kHz. As a result, a series of pulses were generated with 
two adjacent pulses being timely separated by 100 |is, as shown in Fig. 6.7a. For 
each train of pulses generated (by each excitation), the first arrived pulse with the 
highest amplitude, plus two reflections with decaying amplitudes were clearly 
observed. The received pulses were processed using the multiscale enveloping 
technique, and their corresponding envelopes were extracted. As shown in 
Fig. 6.7a, the first arrival and the first two reflections were clearly differentiated 
at level 4. As the reflections have much lower amplitude than the first arrival, they 
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Fig. 6.7 Experimental detection of ultrasonic pulse trains having the same center frequency, 
(a) Single transmitter with a temporal pulse separation of 100 (.is and (b) single transmitter with a 
temporal pulse separation of 20 |is 



can be readily eliminated through thresholding from the extracted envelope. In the 
second experiment, the pulse repetition frequency was increased to 50 kHz, result- 
ing in a temporal separation of 20 |j,s between the adjacent pulses. As shown in 
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Fig. 6.7b, the reflections were buried under the first arrivals; therefore, they did not 
affect the pulse detection. 

To evaluate the pulse detector's ability in differentiating spectrally adjacent 
pulses in the frequency domain, the three transmitters (of center frequencies 
2,480, 2,785, and 3,140 kHz) were placed side-by-side on one side of the steel 
block and excited simultaneously, with the excitation repetition frequency being 
30 kHz (corresponding to 33 [is pulse separation). The pulses received by the 
ultrasonic receiver are shown in the upper portion of Fig. 6.8a, where temporal 
overlap of the three transmitters carmot be differentiated in the time domain. 
Applying the multiscale enveloping technique, the envelopes of the three pulse 
trains were successfully extracted and differentiated in levels 2, 3, and 4, respec- 
tively, as shown in Fig. 6.8b. 

To examine the robustness of the multiscale enveloping technique, repetition 
frequency of the excitation input to the transmitters was varied to be 30, 20, 
and 10 kHz for the three transmitters, resulting in a pulse separation of 33, 50, 
and 100 [is, respectively. As shown in Fig. 6.8b, the pulse trains were again 
successfully detected and differentiated, with the corresponding envelopes sepa- 
rated into levels 2, 3, and 4, respectively. 



6.3.2 Bearing Defect Diagnosis in Rotary Machine 

A large number of applications in machine condition monitoring involve rotary 
machine components, for example, bearings, spindles, and gearboxes (Kiral and 
Karagiille 2003; Wu et al. 2004; Choy et al. 2005). To detect structural defects that 
may occur in these machine components, spectral analysis of the signal's envelope 
has been widely employed (McFadden and Smith 1984; Ho and Randall 2000). This 
is based on the consideration that structural impacts induced by a localized defect 
often excite one or more resonance modes of the structure and generate impulsive 
vibrations in a repetitive and periodic way. Frequencies related to such resonance 
modes are often located in higher frequency regions than those caused by machine- 
borne vibrations, and are characterized by an energy concentration within a rela- 
tively narrow band centered at one of the harmonics of the resonance frequency. By 
utilizing the effect of mechanical amplification provided by structural resonances, 
defect-induced vibration features can be separated from the background noise and 
interference for diagnosis purpose. However, as different resonance modes can be 
excited under varying machine operating conditions, consistent results are not 
guaranteed by simply applying the traditional enveloping spectral analysis. 
Research has found that complementing the wavelet-based multiscale enveloping 
with spectral analysis by means of the multiscale enveloping spectrogram 
(MuSEnS) technique could significantly enhance the effectiveness of bearing defect 
diagnosis (Yan and Gao 2005b). Basically, the MuSEnS starts with a signal's 
envelope extraction by using the developed wavelet-based multiscale enveloping 
technique; Fourier transform is then performed repetitively on the extracted 
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Fig. 6.8 Experimental detection and differentiation of temporally overlapped and spectrally 
adjacent ultrasonic pulse trains generated by three transmitters, (a) Three transmitters with pulse 
separation of 33 [is and (b) three transmitters with pulse separation of 33, 50, and 100 ^ts 

envelope signal env„i{s, t) at each scale s, resulting in an "envelop spectrum" of the 
original signal at the various scales. Such envelop spectra can be expressed as 



ENV^sJ) = F[env^,is,t)] 



1 

271 



\\wt,{s,r)\\e-'^''f' dr 



(6.16) 
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where the envelope signal env„,,(5, t) is obtained using (6.15), and calculated 
directly from the modulus of the wavelet coefficients 
|M'rc(s, t)|| of the original signal. The result of the ENVni{s,f) operation is a 2D 
matrix, with each of its rows corresponding to the envelop spectrum of the vibration 
signal at a specified scale s, and each of its columns corresponding to a specific 
frequency component of the envelope spectrum across all the scales. By looking at 
the square of the magnitude of ENVn-t{s,f), as seen in 

Eis,f) = \ENV„,(sjf (6.17) 

which is termed as the energy spectrum, the final output E(s,f) of the MuSEnS is 
obtained. The energy spectrum indicates how the energy content is distributed in 
the scale-frequency plane. For the purpose of visualization, such a result can be 
illustrated in a 3D scale-frequency-energy map, which can indicate the intensity 
and location of the defect-related frequency lines. The applications of the MuSEnS 
technique to bearing defect diagnosis are introduced in the next section. 



6.3.2.1 Numerical Simulation Using the MuSEnS Algorithm 

A synthetic signal that consists of different signal components for simulating 
vibration signal from the rolling element bearing is first constructed to quantita- 
tively evaluate the MuSEnS technique. Generally, vibration signals from a bearing 
may include the following constituent components: 

1. vibration caused by bearing imbalance with a characteristic frequency of /u, 
equal to the bearing rotational speed, which occurs when the gravitational center 
of the bearing does not coincide with its rotational center 

2. vibration caused by bearing misalignment at frequency /,„, equal to twice the 
shaft speed, which occurs when the two raceways of the bearing (inner and 
outer) fall out of the same plane, resulting in a raceway axis that is no longer 
parallel to the axis of the rotating shaft 

3. vibration due to rolling elements periodically passing over a fixed reference 
position on the outer raceway, at the frequency /bpfo 

4. structure-borne vibration attributed by other components, which is broadband in 
nature, and can be modeled as white noise. 

When a localized structural defect occurs on the surface of the bearing raceways 
(inner or outer), a series of impacts will be generated every time the rolling 
elements interact with the defects, subsequently exciting the bearing system. 
Such forced vibration is represented by high-frequency resonances that are ampli- 
tude modulated at the repetition frequency of the impacts. 

For the numerical simulation, only defect-induced resonant vibration and 
structure-borne vibration are considered in the synthetic signal, as other vibration 
components can be filtered out through data preprocessing. The simulated resonant 
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vibration is obtained experimentally from the measured impulse response of a 
ball bearing (model 2214). This bearing has 17 rolling elements. When it rotates at 
300 rpm, a total of eight impacts will be generated per bearing revolution, because of 
the ball-defect interactions. This translates into an impact interval of 25 ms or a 
40-Hz signal repetition frequency. Figure 6.9a illustrates such a series of impact- 
related vibrations. By adding white noise to these vibrations, a synthetic signal is 
then generated to simulate the actual bearing vibrations due to a localized outer 
raceway defect. The signal-to-noise ratio (SNR) of the synthetic system is set at 
— 12 dB. The synthetic signal with its time and frequency domain waveform is 
shown in Fig. 6.9b, c. Because of the noise corruption, no apparent signal feature 
could be identified, except for the relatively dominant spectral components ranging 
from 2,500 to 3,500 Hz. 

The synthetic signal is analyzed using the wavelet-based MuSEnS technique, 
where the complex Morlet wavelet is used as the base wavelet for defect character- 
istic extraction. A series of equally spaced scales ranging from 1 to 6 (with an 
increment of si) were chosen to stretch the complex Morlet wavelet for extracting 
the defect-related feature embedded in the synthetic signal. The lower and upper 
limits of the scales correspond to the wavelet center frequency at 10,000 and 
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Fig. 6.10 Defect repetition frequency detection on the synthetic signal using multiscale envelop- 
ing spectrogram (MuSEnS) technique 

1,667 Hz, respectively. This ensures that the defect-induced resonant vibration 
component can be fully covered by the wavelet transformation. To increase the 
possibility of matching the center frequency of a scaled wavelet with the frequency 
of the defect-induced resonant vibration, a small-scale interval is preferred. How- 
ever, a small-scale interval leads to increased computational load, as more scales 
will be involved in the signal decomposition. A trade-off must therefore be made 
between the accuracy and computational time. On the basis of preliminary studies, 
an increment of Si = 0.2 was employed in this study. As the MuSEnS shown in 
Fig. 6.10, high-energy concentration can be identified at the 40-Hz frequency line, 
which corresponds to the defect-related repetition frequency. Also strongly repre- 
sented in the spectrogram is the harmonics of the defect-related frequency at 80 Hz. 
This result demonstrates the effectiveness of the MuSEnS algorithm in identifying 
defect features hidden in bearing vibration signals. 



6.3.2.2 Case Study 

The first experimental case study of using MuSEnS algorithm to diagnose bearing 
defects is conducted on a roller bearing. A seeded defect in the form of 0. 1 mm 
diameter hole is made in the outer raceway. The bearing is subject to a 3,665-N 
radial load, and the shaft rotational speed is 1,200 rpm (or a 20-Hz rotational 
frequency). On the basis of the geometrical parameters of the bearing and the 
rotational speed, a defect-related repetitive frequency of (fsppo — 5.25/ipm) or 
105 Hz can be analytically determined (Harris 1991). Figure 6.1 1 shows the bearing 
vibration signal acquired under the sampling frequency of 25 kHz. From its 
corresponding power spectrum, it is evident that frequency component related to 
bearing rotation is dominant in the frequency region of [0, 150] Hz. However, 
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Fig. 6.11 Signal measured from a roller bearing and its power spectrum 



defect-related frequency component of 105 Hz is submerged in the spectrum and 
therefore, cannot be identified. 

The MuSEnS algorithm is then applied to decompose the bearing signal. The 
scales used are between 1 and 8, with an increment 0.2. These scales cover the 
frequency range of 1.56-12.5 kHz. The corresponding MuSEnS of the bearing 
vibration signal is shown in Fig. 6.12. Two major peaks are clearly shown at 20 and 
105 Hz frequency lines, respectively. The 20-Hz component runs across the entire 
scale region, and is related to the bearing rotating speed. The 105-Hz component is 
identified at the scales of 1-2.4, and represents the repetitive frequency of the 
bearing due to the structural defect on the outer raceway. This demonstrates that the 
MuSEnS is able to clearly identify the existence of the structural defect, and 
pinpoint its location on the outer raceway for diagnosis purpose. 

The second experimental case study of using the MuSEnS algorithm for diagno- 
sis of bearing inner raceway defect is investigated on a ball bearing of model SKF 
6220. The defect-related repetitive frequency is calculated to be (/bpfi = ^■9frpm) or 
59 Hz, basedon the bearing geometry and rotational speed (600 rpm) (Harris 1991). 
A radial load of 10,000 N is applied to the bearing. As shown in Fig. 6.13, while 
frequency components related to the shaft speed and ball rotation are shown in the 
power spectrum, the defect-related repetitive frequency is not identified. 

The MuSEnS algorithm is then applied to the same signal, with the decomposi- 
tion scales chosen to be '^2-10 at an increment of 0.2. The scales cover the 
frequency range from 500 to 2,500 Hz. As shown in Fig. 6.14, besides frequency 
components related to the shaft frequency and its harmonics, an appreciable peak 
can be seen at 59 Hz, which is the inner raceway defect-related repetitive frequency. 
This indicates that a structural defect exists on the inner raceway of the ball bearing. 
The peaks at 49 and 69 Hz frequency lines are attributed to the combined effect of 



6.4 Summary 



99 



0.2^ 



105 Hz 




150 



60 
Frequency (Hz) 



Fig. 6.12 MuSEnS of vibration signals measured on a roller bearing with a structural defect on the 
outer raceway (speed: 1,200 rpm; radial load: 3,665 N) 
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Fig. 6.13 Signal measured from a ball bearing and its power spectrum 



2500 



bearing imbalance at 10 Hz frequency and the stractural defect at 59 Hz frequency, 
as they can be calculated as 59 ± 10 Hz. 



6.4 Summary 



A wavelet-based multiscale enveloping technique is introduced in this chapter. This 
multidomain signal processing technique combines band-pass filtering (implemen- 
ted through variation of the scale parameter s of the base wavelet) and enveloping 
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Fig. 6.14 MuSEnS of vibration signals measured on a ball bearing with a structural defect on the 
inner raceway (speed: 600 rpm; radial load: 10,000 N) 

(obtained through modulus of the wavelet coefficients) into a single-step operation. 
The effectiveness of the muUiscale enveloping technique is demonstrated through 
studies in ultrasonic pulse differentiation for pressure measurement in injection 
molding and bearing defect diagnosis in rotary machines, both numerically and 
experimentally. 

When the multiscale enveloping technique is used to identify the ultrasonic 
pulses generated from injection molding process, not only spectrally identical and 
timely adjacent but also timely overlapped and spectrally adjacent ultrasonic pulses 
could be detected and differentiated. This allows for comprehensive spatial cover- 
age of the cavity pressure profile by placing multiple sensors with different working 
frequency ranges at different locations in the injection mold. When the wavelet- 
based enveloping is combined with spectral domain postprocessing, a new algo- 
rithm termed MuSEnS was developed, which has been shown to be more accurate 
and illustrative in depicting critical features related to structural defects embedded 
in the bearings than the traditional enveloping spectral analysis. As many of the 
applications in manufacturing equipment and system monitoring involve rotary 
machine components (e.g., bearings, spindles, gearboxes, etc.), it is possible that 
the MuSEnS technique may contribute to improving solutions to a wide variety of 
machine monitoring problems. 
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Chapter 7 

Wavelet Integrated with Fourier Transform: 

A Unified Technique 



Fourier transforai-based spectral analysis has been widely applied to processing 
signals, such as vibration and acoustic signals (Mori et al. 1996; Tandon and 
Choudhury 1999; Cavacece and Introini 2002), acquired from manufacturing 
systems. Because of noise contamination and signal interference, the constituent 
components of interest may be submerged in the signal and difficult to be revealed 
through a spectral analysis (Ho and Randall 2000). Furthermore, events occurred in 
the manufacturing system may be transient in nature, for example, the initiation and 
propagation of surface spalling in a ball bearing (Gao and Yan 2006; Orhan et al. 
2006). As another example, the process of metal removal can be viewed as 
consisting of multiple, individual transient events in which a single chip of metal 
is removed (Ge et al. 2004; Obikawa and Shinozuka 2004; Byrne and O'Donnell 
2007; Malekian et al. 2009). On the one hand, given the global analysis nature, it is 
difficult to apply the Fourier transform for localizing these transient events. On the 
other hand, the Fourier transfomi can identify a signal's frequency components, 
from which a specific event (e.g., localized bearing defects, which have distinct 
characteristic frequencies at inner raceway, outer raceway, or rolling element itself) 
can be detected. Leveraging the capability of wavelet transform in transient signal 
analysis, this chapter introduces a unified time-scale-frequency analysis technique 
through spectral postprocessing on the data set extracted by wavelet transforms to 
enhance the effectiveness of signal representation and identification. 



7.1 Generalized Signal Transformation Frame 

Fourier transform and wavelet transform were originated from different theoretical 
platforms, and each technique analyzes a signal from a different perspective. 
Specifically, the Fourier transform depicts the energy concentration of constituent 
frequency components within the signal, whereas the wavelet transform presents 
the similarity between the signal being analyzed and the base wavelet, in the 
time-scale domain. To enable cross-domain unification of the two techniques for 
signal analysis, a common signal transformation platform needs to be established 
first, which is the focus of this chapter. 
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Let us first define a function WioiO within a certain time interval, or support, 
expressed as [0, L), wiiere the symbol L represents the width of support. The 
function VKi_o(0 is called the base template function for signal analysis. Next, we 
define W^, u(t), which is a derived version of Wioit). Comparing to Wioit), the 
magnitude of VK, „(0 is scaled by scale s, where i > is an integer number, and its 
location along the time axis is shifted by time u, with u^R, and R represents the set 
of all real numbers. The function VKj „(0 is called the derived template function, at 
scale s and time u, and is supported within the time interval [u, u + sL]. Generally, 
W,„(0 can be expressed in terms of the base template function Wyjait) as: 

where l/^/s is a factor for purpose of normalization. Specifically, it ensures that the 
following relationship between the derived template function VKj „(0 and the base 
template function W\fl{t) is always satisfied: 



Wl{t)dt^ wUt)dt^\\Wifl{t)\\' (7.2) 



The physical significance of (7.2) is that all the derived template functions 
preserve the same amount of energy as the base template function does. 

In a linear signal space, the set of all derived template functions {Ws,uit): s>0, 
u ^ R) forms a continuous frame Fc, spanned by scale s and time u. In accordance 
with the discrete data sampling process, where data points are taken at time 
instances u — mkL from a derived template function with a scale factor of s — k, 
a discrete expression of the derived template function is obtained as: 

where VK^ „,(0 is a simplified expression for VK^ „^^i(0• In the above notation, k or 
k~^ ^ N, m ^ Z, and k and m represent a discrete version of the continuous 
parameters s and u. The notation k~^ e A' corresponds to s<l. The set {W/^^ „,{t): 
k or k~^ '^ N, m ^ Z], with A' being the set of all nonnegative integers and Z being 
the set of all integers, forms a discrete frame T^, spanned by the parameters k and m. 
The continuous frame F^ (or discrete frame F^) provides a generalized frame for 
signal transformation, and is defined as complete in the linear signal space, if any 
signal function x(t) can be expressed in it as (Kaiser 1994): 

x(?) = / / C{s,u)Ws,,it)dsdu (7.4) 

Jo J-oo 

or, in the case of discrete frame F^: 
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x(r)=^ ^ C(A:,m)iy,,„(f) (7.5) 

k—l m——oc 

In (7.4) and (7.5), the coefficient of the functions, C(s, u) or C{k, m), can be 
considered as a measure function, which expresses the extent to which the signal 
function x{t) is correlated to the derived template function | Ws.uit)'- s>0,u^R} of 
scale s, at the specific time u. 

Under the discrete frame T^, the significance of the measure function expressed 
in (7.5) can be further illustrated, considering that there exists a complete orthogo- 
nal set {Wi,„,(0) within such a frame. The orthogonal identity states that: 



/•^ f / Wl ^ (t)dt; for /t2 = yti, 1712 

/ W,„„,it)Wt,^„,,it)dt = \ J-00 '"'"'^^ 



(7.6) 



0; otherwise 



where k^ and ^2 ^ R!> and m^ and OT2 ^ {'''I- Using the identity of (7.6), multi- 
plying the two sides of (7.5) by Wk^^m^{t), and integrating over the time interval 

(— 00, 00 ), we have: 



x{t)Wk,.,n, {t)dt = / Y.Y. *^(^' '^OW'/,„,(OWa,,«„ {t)dt 

^C{kum,) rwl,Jt)dt 

J —OC 



(7.7) 



With k—ki and m = niy, rearranging (7.7) yields, 

c{Km)- j^^^2j^^,^ r^wut)dt ^'-'^ 

The equation indicates that the measure function C(k, m) expresses the correlation 
(or similarity) between the signal function x{t) and the derived template function 
^k,m{t) of scale k and at time mkL. This concept can be expanded to view a signal 
transformation operation, such as Fourier or wavelet transform, as a correlation 
operation between a signal and a template function, and the result expresses the 
measures of correlation between the two functions. When a template function 
derived from a base template function has a large value of C(k, m), or correlation, 
with a signal feature at certain scale k and time mkL, the template function is said to 
have a good match with that corresponding feature. As a result, the feature will be 
effectively extracted by this particular template function. Signal components that 
are of little correlation to the template function will show small or no correlation 
measures, and thus be suppressed in the analysis. A signal may show different 
correlation measures with different base template functions. In Table 7.1, several 
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Table 7.1 Basic template functions expressed in the generalized transformation frame 
Frame {Wt,mit)] Properties 

Fourier Wio{t) = e^^'^"^''. f6[0,L) Exponential function forms a 

function complete orthogonal base 

Haar f +1 < t<L/2 Rectangular waveform forms a 

function W'i.o(f) = S f /o ^ / ' '^ ["'^J complete orthogonal base 

Daubechies Wt n(t) = tl/^"Ut) f 6 fO L) " Fractal shape forms a complete 

function ' orthogonal base 

"There are different Daubechies functions i/'i"o(f) 
depending on different order of n 



basic template functions are expressed in the generalized signal transformation 
frame, and they are discussed in the following sections. 



7.1.1 Fourier Transform in the Generalized Frame 

A signal x(t) of period T can be expressed through its Fourier transform as 
(Bracewell 1999): 

OC 

x{t) = Y^ c„e"-''2™'/^, -oo<f<oo, neN (7.9) 

n=0 

where c„ is the nth-order transformation coefficient. If a single-period complex 
exponential function is defined as the base template function: 

Wifi{t) ^ e-j^^''\ te[0,L) (7.10) 

then the corresponding derived template function can be expressed, per definition in 

(7.1), as: 

iyt.„(r) =^e"-''2''(H:-"0, te[mkL,mkL + kL) (7.11) 

Using the derived template function, the signal x{t) shown in (7.9) can be 
expressed, in the generalized transformation frame, as: 



-<0 ^Y.Y.Cik,m)WUt) = ^^^C(^,^)e-^'2''(^-"') 

k m k m VK 

E \Y.^^C{k,m)\c-'^^r^ ^Y.CtVk{t) 



k \ m 
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where the term 

C, = Y,\c{k,m) (7.13) 

represents the sum of the normahzed individual measure functions corresponding to 
the discrete scale k in the Fourier transform, and the term 

V,^{f) = e-^^Tt^ (7.14) 

represents a periodic exponential function with a period of kL over the time 
interval (— oo, oo), for a given scale k. Comparing (7.9) with (7.12), it can be 
seen that k ~ l/n when L — T, and c„ = Q, The expression k— 1/n indicates that a 
higher scale (k) corresponds to a lower frequency («), and c„ — Q is defined only 
by the frequency « or scale k. From this discussion, the Fourier transform can be 
viewed in the generalized frame as a ID function, defined by the scale k with the 
orthogonal base |Vfc(0}- There is no time information of the extracted signal 
features contained in this function. This explains why the Fourier transform does 
not provide time information of the extracted frequency components. 



7.1.2 Wavelet Transform in the Generalized Frame 

Wavelet transform decomposes a signal using finite time intervals (or support) at 
different scales, thus maintains the time location information of the signal features. 
Through variation of the scales of the template function used, nonstationary or 
transient features within a signal can be extracted more effectively than by using 
the exponential (sine and cosine) functions in the Fourier transform. The wavelet 
transform can be expressed as: 

/oc 
x{t)Ws,u{t)dt (7.15) 

-oo 

where the term C(s, u) represents the wavelet coefficients (or measure function, in the 
generalized signal transformation frame). The wavelet function Ws^JJ) is defined 
by (7.1). To reduce computational load and avoid redundancy in signal representa- 
tion, discrete instead of continuous wavelet transform is often employed for analyz- 
ing signals consisting of discrete data points acquired through a data acquisition 
process. Generally, a discrete wavelet transform discretizes a signal by using a scale 
of the power of 2 (Daubechies 1992; Kaiser 1994). At the scale k = 2" {n e N), the 
discrete wavelet function VKi-_„,(0 {ni £ Z) has the support of T^ — 2"L. Physically, the 
term 2"L represents the time resolution, which increases linearly with the logarithm of 
the scale, enabling signal feature extractions at different resolutions. The frequency 



108 



7 Wavelet Integrated with Fourier Transform: A Unified Technique 



resolution of wavelet transform is the inverse of the time resolution, or l/r„. 
It increases as the scales become higher (i.e., when n increases), and is therefore 
well suited for analyzing slow-changing signals. At lower scales, the frequency 
resolution decreases, enabling analysis of fast-changing signals. Such is in contrast 
to the Fourier transform, which maintains a constant frequency resolution over the 
entire spectrum, and have a time resolution that is defined by the signal duration. 

To illustrate the ability of wavelet transform in signal feature extraction, we 
analyze an frequency shifted keying (FSK) signal, which is commonly used for data 
modulation and wireless data transmission (Gibson 1999). An FSK signal is 
expressed as: 



4') 



square (271/1?) for message " 1 " 
square (27:/2?) for message "0" 



(7.16) 



where square {Infj) represents a periodic square wave with a unit amplitude and 
frequency/,,. Figure 7.1 illustrates an example of such an FSK signal, where the 
frequency /i = 30 Hz is used to transmit the digit "1" and /2 = 125 Hz is used to 
transmit the digit "0." The message to be transmitted is [1 1 10 10 0]. Such a 
signal x{t) is nonsinusoidal and nonstationary. 

To analyze this signal, we choose the Haar wavelet as the base wavelet, since its 
square-shaped wave form best matches the shape of the FSK signal. Given that the 
wavelet has a support of L = 1 s (refer to Table 7.1), the measure function C{s\, m) 
can be calculated by using (7.8). For the message "1," C(.?i, ni) is calculated to 
be ^/s\, at scale S\ — l/fi and time t = msi. The measure C(si, m) is zero for the 
message "0." Similarly, at scale S2 ~ l/fi, C{s2, m) is y/si at time t — ms2 for 
the message "0," and zero for a message " 1 ." The result of such a wavelet transform 
operation indicates that the Haar wavelet was able to locate the time instant of the 
message "1" or "0" at scales Si and S2, respectively, and expresses the FSK signal 
x{t) in the generalized signal transform frame as a single-form expression: 



x{t) 






c(.i,«2)f(;,uo- 



C{s2,m)^ilUt) 



(7.17) 
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Fig. 7.1 A FSK signal x(t) 
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Extracted FSK signal for message "1" 
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Extracted FSK signal for message "0" 
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Fig. 7.2 Extracted FSK signals, (a) Extracted FSK signal for message "1" and (b) extracted FSK 
signal for message "0" 

The results are shown in Fig. 7.2, where the messages "1" and "0" can be clearly 
separated into two different scales. In comparison, it is not feasible to use the 
Fourier transform to specify at which time which message (1 or 0) is transmitted. 
The sine and cosine template functions in the Fourier transform do not match the 
square waveform of the FSK signal, and consequently, the frequency component of 
the FSK signal will be spread out in a broad spectrum, especially when the message 
"0" and "1" are randomly transmitted. 



7.2 Wavelet Transform with Spectral Postprocessing 



As a time-scale domain technique, the wavelet transform utilizes template func- 
tions of different time resolutions at different scales to extract "transient" features 
embedded in a signal. Such transient features can be generated by the interactions 
between the rolling elements in bearing and a localized defect (e.g., surface spal- 
ling) on the surface of the raceway. As the rolling elements periodically roll over 
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the localized defect, the "transient" feature will reoccur at a fundamental frequency 
/o, which is a function of the bearing rotational speed. Such a relationship will be 
reflected in a wavelet transform of the bearing vibration signal, in that the measure 
function C{si, m) will retain the same fundamental frequency along the time axis, at 
one of its scales (^i). As a result, the spectral feature of the transient signal is 
retained in the wavelet transform, although it is not explicitly expressed. Because of 
masking by noise and other signals with similar spectral characteristics that appear 
at the same scale, it can be difficult to rely on wavelet transform alone to identify 
such hidden patterns. 

Such constraint of the wavelet transform can be compensated for by subse- 
quently applying the Fourier transform to the measure function C{si, m) resulting 
from the wavelet transform. Such a postspectral technique reveals the specific 
frequency location of the transient features, and presents a unified approach to 
transient signal processing. The following section explains how such a postspectral 
method is realized. 



7.2.1 Fourier Transform of the Measure Function 

In a complete linear signal space, the wavelet-extracted data set at scale s can be 
expressed as: 



x,(r) = / C,{u)W,{t - u)du = C,(r) ® W,{t) (7.18) 

where the symbol (^ represents the convolution operation between the measured 
function C^iu) and the wavelet function W^iu). To perform Fourier transform on the 
data set, the Fourier transform of the measure function Cs{u) is first derived. For this 
purpose, the wavelet transform defined in (7.15) at a fixed scale s is rewritten as: 

/oc 
x{t)Ws{t-u)dt (7.19) 

-oo 

In (7.19), the terms CXu) and W^it — u) represent their respective counterparts in 
(7.15), C{s, u) and Ws,,i(t), with a fixed scale s. Through a normalization operation, 
l|M^i,o(OlP in (7.2) is set as 1 for simplicity. With respect to time u, the Fourier 
transform of Csiu), denoted as Csif), is derived as: 

Csif) = x(f)Ws.u(f) (7.20) 

where the symbol x(f) expresses the Fourier transform of the signal x{t). The 
symbol Ws,ii(f) expresses the Fourier transform of the wavelet function Vl^s,„(0, 
which is derived as: 
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W,At)e-^^^^' dt - j^ -^ Wu, (-^\ e-'^^f dt 



7jy'-(-^''-" 



(i^).-«'[...(^) 



V~s / Wi,o 



=y/si 



-y2jc/" 






(7.21) 



„(l^),-./2^A(-)^(i^) 



=j:^e-j^''f"^ 



Combining (7.21) witli (7.20) yields: 

Let Wl^{sf) = e-J^''f"Wifi{sf), (7.22) can be further expressed as: 

c,(/) = Vsmwioi^f) 



(7.22) 



(7.23) 



where the superscript * denotes the conjugate operator. 

Similar to (7.19), in the case of discrete wavelet transform, the discrete measure 
function CiJjn) at a fixed scale k can be expressed as: 



Ck{m) 



x{t)Wk{t - mkL)dt 



The corresponding Fourier transform of CiJjn) is expressed as: 
In (7.25), Wk^mif) is derived as follows: 



(J2A) 



(7.25) 



Wk,m{f) 



1 ft - mkL 



,-flnfr 



dt 



1 

7k 



^1,0 f^).--^' 



k-d 



t - mkL\ 
k j 



--V~k r 1^1,0 (J - mL)e->2'^^(^"'^+"'^)*j(^ - ml) 
--Vke-'^-f"*'^ r Wi,,{^- - mL)e-'2-^(i-'"^)rfg - ml) 



(7.26) 
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As a result, (7.25) is given by: 

Ck(f) = xifWk.„,if) - m^ke-J^-f-'^WUkf) (7.27) 

Let W't,o(¥) = Wifl{kf)e-i^''f"'''^, (7.27) can be further expressed as: 

Ckif) = V~ki(fWxfl{kf) (7.28) 

Equations (7.23) and (7.28) illustrate that the Fourier transform of the measure 
function at scale s (for continuous transform) or k (for discrete transform) can be 
viewed as the original signal x(t) passing through a data filter, which is a contracted 
(by a frequency factor of s or k) and amplified (by a factor of ^/s or\fk) version of 
the filter represented by the base wavelet function. Such an operation establishes the 
link between measured function and data filtering, and is of significance in wavelet 
transform-based signal analysis. 

7.2.2 Fourier Transform of Wavelet-Extracted Data Set 

With the Fourier transform of the measure function obtained, the Fourier transform 
of the extracted (or reconstructed) data set from the continuous wavelet transform, 
denoted x^it) as shown in (7.18), can be derived as: 

=^fsx{f)W\^{sf)^sW,,Q{sf) (7.29) 

=sx{f)Wifi{sff 

In case of a discrete wavelet transform, the Fourier transform of the data set Xjjit) is 
obtained by setting s = k and u = mkL in (7.18), and its Fourier transform is 
expressed as: 

Xkif) ^Ck{f)Wk{J) 

=Vkx(f)Wlo{kf)VkWi,o{kf) (7.30) 

=kx(f)\Wi.om\^ 

This indicates that the Fourier transform of the extracted data set x/.{t) at scale k can 
be viewed as the Fourier transform of the original signal x(t) passing through a low- 
pass filter and the filter being represented by the transfer function |VV'i.o(^/) | ■ If the 
template function at scale k correlates well with the "transient" feature of the signal 
x(t) in the time domain, then its Fourier transform will contain a strong "distur- 
bance" component in its spectrum. As a result, the filter |VV'i,o(^/) | will extract the 
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Wi^oikf) Filter 
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Frequency 
Fig. 7.3 Illustration of the filtering effect of wavelet transfonn 

"disturbance" signal features from the original signal x{t) at the scale k, as shown in 
(7.29) and (7.30). Because of a lower degree of correlation between this filter and 
other constituent components in the signal, other components will be attenuated at 
the scale k. 

The filtering effect of postspectral processing of a wavelet transformed data 
series is illustrated in Fig. 7.3. The "transient" feature, represented by the solid thick 
line, is shown to have a fundamental frequency /i, characterized by a magnitude 
peak at/i and its harmonics (2fi and 3/i) in the frequency spectrum. When an 
appropriate base wavelet is selected (or designed) to decompose this signal (Holm- 
Hansen et al. 2004), a base template function will exist at a certain scale k where a 
high degree of correlation between the template function and the "transient" 
features can be identified. If the data set resulting from such a wavelet transform 
is subsequently processed by the Fourier transform, the result will be a data 
spectrum similar to that of the "transient" signal, with its major frequency compo- 
nents at/i, 2/i, and 3/i, respectively. Such a postspectral analysis can be viewed as 
filtering the data set in the frequency domain denoted by Wi,o{kf). 



7.3 Application to Bearing Defect Diagnosis 



This section illustrates how the unified time-scale-frequency analysis technique 
described earlier can be applied to rolling bearing defect diagnosis. A custom- 
designed bearing test bed, as shown in Fig. 7.4, is set up to provide an experimental 
platform for evaluating the developed method. Axial and radial loads on the test 
bearing are applied through a hydraulic system, and the bearing rotation speed is 
varied by controlling the DC drive motor. Commercially available accelerometers 
(model 8624) are placed on the housing for vibration measurement with a data 
sampling frequency of 10 kHz. A deep-groove ball bearing of type 6220 with a 
seeded structural defect serves as the test bearing. The defect is implemented as a 
0.25-mm diameter hole drilled on the inner raceway of the bearing, simulating the 
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Beaiing test bed 



Speed mea;U[«inent— I 
dice 




Fig. 7.4 Bearing test bed with hydraulic load application capability 



condition of a surface spall. The relationship between the bearing rotational 
frequency /^ and the characteristic frequencies associated with defect-induced 
vibrations can be detemiined analytically as a function of the defect location, for 
example, on the inner raceway (/bpfi), outer raceway (/bpfo) or a rolling element 
(/bsf) (Harris 1991). Specifically for the test bearing, these characteristic frequen- 
cies are calculated as/eppj « 5.86/r,/BPFo ~ 4.1/f,/BSF ~ 5.3/r, respectively (SKF 
1996). By identifying the existence of these characteristic frequencies and/or their 
combinations, the existence of bearing structural defects can be determined. 
In the experimental evaluation, following aspects are studied: 

1. The effectiveness of the unified time-scale-frequency analysis technique in 
extracting defect features (i.e., characteristic frequencies) from bearing vibration 
signals is compared to that of the Fourier transform and the discrete wavelet 
transform when it is applied alone. 

2. The effectiveness of the new technique at different wavelet decomposition 
levels. 

3. The effectiveness of the new technique under varying bearing operating condi- 
tions, such as the radial load, axial load, and shaft rotational speed. 
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7.3.1 Effectiveness in Defect Feature Extraction 

To establish a basis for objective comparison, vibration signals are measured on both 
a defect-free (i.e., healthy) and a defective bearing of the same model (SKF 6220), 
under the same operation conditions: shaft speed /r — 600 rpm (corresponding to 10 
Hz rotational frequency), axial load of 7,038 N, and radial load of 17,468 N. 
Figures 7.5 and 7.6 shows the two signals in the time and frequency domains, 
respectively, while the related frequency resolution is approximately 0.3 Hz. 

As shown in Fig. 7.6, both spectra indicate the existence of two dominant 
frequency components: (1) ball rotation (/bpfoI' ball passing frequency on outer 
raceway, with the subscript "1" referring to the 6220 bearing) and (2) bearing 
misalignment (/„,)■ Ball rotation-related vibration has a peak value at the funda- 
mental frequency /bpfoi = 41 Hz, which is equal to 4.1 f^. Misalignment-related 
vibration has a characteristic frequency of/„ = 20 Hz ( = 2/^). In addition to these 
two major components, components related to bearing imbalance are also identified 
at the frequency /u = 10 Hz (identical to /r). However, visual comparison of the 
two spectra reveals no difference between them, as the characteristic frequency 
of /bpfii = 58.6 Hz, related to the inner raceway defect, is not recognized. 
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Signals from a healthy bearing 




0.4 0.5 0.6 

Time (s) 

Signals from a defective bearing 

Fig. 7.5 Time domain signals from a healthy and a defective hearing, (a) Signals from a healthy 
bearing and (b) signals from a defective bearing 



116 



7 Wavelet Integrated with Fourier Transform: A Unified Technique 



a 

0.04 

N 

E 0.03 
I 0.02 
m 0.01 







faPFOi 




f„ 




- 


f:, 




J 


V 



10 



20 



50 



60 



30 40 

Frequency (Hz) 

Power spectrum density from a healthy bearing 



70 







•'BPFOl 


5 0.06 
|0.04 






~ 






- 


8 0.02 

0. 






- 



10 20 30 40 50 60 70 

Frequency (Hz) 

Power spectrum density from a defective bearing 

Fig. 7.6 Results of frequency domain analysis of the healthy and defective bearing, (a) Power 
spectrum density from a healthy bearing and (b) power spectrum density from a defective bearing 



This illustrates that Fourier transform, when applied alone, may not be effective in 
detecting the existence of bearing structural defect. 

The same signals are then analyzed using discrete wavelet transform, with the 
Daubechies 2 wavelet as the base wavelet. Figure 7.7 illustrates the wavelet 
coefficients of the vibration signals at the decomposition level 7, which has a 
corresponding frequency range of 39-78 Hz, thus covering the defect characteristic 
frequency of/eppn = 58.6 Hz. As seen in Fig. 7.7, the wavelet coefficients for the 
defective bearing have shown more dynamical variations than that of the healthy 
bearing. To quantify their difference, the root-mean-square (RMS) values of the 
two wavelet coefficients are calculated. It is found that the RMS value of the 
defective bearing (56 mV) is about 145% larger than that of the healthy bearing 
(22.8 mV). Although such an increase can be used as an indicator of structure defect 
in the bearing, it has the limitation that proper threshold needs to be set up a priori, 
to determine the quantitative extent that distinguishes a healthy bearing from a 
defective one. Another limitation of the wavelet transform is that the wavelet 
coefficients do not provide any indication on the specific location of the defect in 
the bearing, since it does not explicitly reveal the characteristic frequencies from 
the bearing. 

Next, the bearing signal (as shown in Fig. 7.5) is analyzed using the unified 
time-scale-frequency method. For this purpose, the wavelet coefficients of the 
signal (shown in Fig. 7.7) is postprocessed using the Fourier transform. 
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Fig. 7.7 Wavelet decomposition of bearing signals at decomposition level 7. (a) Wavelet coefti- 
cient from a healthy bearing and (b) wavelet coefficients from a defective bearing 
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Fig. 7.8 Unified analysis for defect feature extraction at decomposition level 7. (a) PSD of the 
healthy bearing and (b) PSD of the defective bearing 
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Comparing the spectra of the healthy (Fig. 7.8a) and defective bearings 
(Fig. 7.8b), it is seen that the inner raceway defect can be clearly identified by its 
characteristic frequency at /bpfh — 58.6 Hz. No distinctive peak is seen in the 
spectrum of the healthy bearing at this frequency. The spectrum further indicates 
several other major peaks at/^ = 20 Hz,/bpfoi = 41 Hz, and/eppoa = 56.5 Hz. 
These are reflective of misalignment (at 20 Hz) of the defective bearing and 
rotational characteristic of other bearing. For example, /bpfoi = 41 Hz is the ball 
passing frequency of the type 6220 bearing, and /bpfo2 = 56.5 Hz is found to be 
related to ball rotation of a different bearing (cylindrical bearing type 2322 with its 
vibration component indicated by the subscript "2"). This is based on the para- 
meters of Z = 14, D — 33.5 mm, and d^ — 175 mm, and the characteristic frequency 
of the type 2322 bearing is calculated asfspFii — 83.4 Hz,/bpfo2 = 56.5 Hz,/bsf2 = 
50. 1 Hz. This bearing structurally supports the rotating shaft in the bearing test bed. 
Comparing with the Fourier analysis and the wavelet transform, the new, unified 
time-scale-frequency technique has shown to be more effective in extracting 
bearing defect features. In that it not only reveals the existence of a localized 
bearing defect, but also the defect characteristic frequency that is indicative of its 
specific location (e.g., inner raceway). 



7.3.2 Selection of Decomposition Level 

When evaluating the unified technique, a particular decomposition level (e.g., level 
7) is chosen for the wavelet transform. The selection of an appropriate level is based 
on the signal sampling rate (or frequency) /sample and the defect characteristic 
frequency /chai-. The relationship is expressed as: 

/sample ^- /■ ^ /sample /T o l \ 

where L denotes the wavelet decomposition level. As an example, when the 
sampling frequency is /sample = 10 kHz, the frequency range associated with 
decomposition level L = 7 is calculated as 39-78 Hz. In Table 7.2, the frequency 
ranges covered by each of the decomposition levels under a 10 kHz sampling rate 
are shown. The essence of finding the best-suited decomposition level when 
wavelet transforming a dynamic signal is to ensure that its frequency range 
[/sampie/2^^', /sample/2^] covers the characteristic frequency of structural defect in 
the bearing with the highest likelihood, if such a defect exists. 

Table 7.3 lists the best-suited decomposition levels for analyzing bearing vibra- 
tion signals specifically related to a localized inner raceway defect, under various 
bearing rotational (or shaft) speeds. Since at 600 rpm, the defect characteristic 
frequency /bpfii = 58.6 Hz falls within the frequency range of 39-78 Hz, decom- 
position level 7 is chosen initially for data analysis. 
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Table 7.2 Frequency range associated with each decomposition level at a 10-kHz sampling rate 
Decomposition level Frequency range (Hz) Decomposition level (L) Frequency range (Hz) 



2,500-5,000 


5 


1,250-2,500 


6 


625-1,250 


7 


312-625 


8 



156-312 
78-156 
39-78 
19-39 



Table 7.3 Best suited decomposition level for inner raceway defect frequency detection 



Shaft speed (rpm) 



/bpfii(Hz) 



Decomposition level 



Frequency range (Hz) 



300 

600 

900 

1,200 

1,500 



29.3 


8 


58.6 


7 


87.9 


6 


117.2 


6 


146.5 


6 



19-39 

39-78 

78-156 

78-156 

78-156 



X 10 




30 40 

Frequency (Hz) 

Results from decomposition Level 6 



x10 




30 40 

Frequency (Hz) 

Results from decomposition Level 8 



70 



Fig. 7.9 Unified analysis using Daubechies 2 wavelet at levels 6 and 8. (a) Resutls from 
decomposition level 6 and (b) results from decomposition level 8 



The importance of choosing proper decomposition level is illustrated in Fig. 7.9, 
where the results of decomposing defective bearing signal at levels 6 and 8 are 
shown. It is seen that none of the two levels (combined with the postspectral analysis) 
are able to identify the defect characteristic frequency at/Bppii = 58.6 Hz, due to the 
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mismatch between their respective frequency range (level 6 at 78-156 Hz and level 8 
at 19-39 Hz) and the characteristic frequency of the inner raceway defect (at 58.6 Hz). 



7.3.3 Effect of Bearing Operation Conditions 

To investigate the effectiveness of the unified time-scale-frequency analysis 
method in defect feature extraction under varying bearing operating conditions, 
three groups of experiments are designed and conducted using a type 6220 ball 
bearing with a seeded defect. 



7.3.3.1 Variation of Radial Load 

The effect of radial loads on defect feature extraction is illustrated through the 
experimental results shown in Fig. 7.10, where four levels of radial loads are 
presented. It is noted that, as the radial load has progressively increased from 
4,367 to 26,202 N, the peak value of the bearing defect frequency of /bpfii( = 
58.6 Hz) has grown by 609.5%, as given in Table 7.4. Such an increase can be 
explained by the increased preload applied by the rolling elements of the bearing to 
the defect on the raceway. An increased preload enhances the impacts when the 
rolling elements roll over the defect, leading to increased amplitude of the defect 
feature. 



7.3.3.2 Variation of Axial Load 

Increase in the axial load has also shown to lead to an increase of the defect feature 
amplitude, as is evident when comparing the three different axial load conditions in 
Fig. 7.11. For example, when the axial load applied to the bearing increases from 
to 4,192 N, the defect signal amplitude at defect characteristic frequency /bpfh 
has increased by 4.7%, from 3.18 x 10^^^ to 3.33 x 10"^ W/Hz, as listed in 
Table 7.5. This is because the application of axial load on the bearing increases 
the extent of the bearing load zone distribution, resulting in an increased number of 
defect impacts within the load zone (Harris 1991). Such an increase enhances the 
overall energy content of the defect signal at its feature frequency, and is reflected 
by the increased defect feature amplitude. 



7.3.3.3 Variation of Rotational Speed 

The power spectral density graphs in Fig. 7.12 illustrate the effect of bearing 
rotational speed on the defect signal strength. As the speed increased from 300 
to 1,200 rpm, the defect frequency amplitude increased by 71.2%, as listed in 
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Fig. 7.10 Effect of the radial load on defect feature amplitude, (a) Radial load 4,367 N, (b) radial 
load 10,918 N, (c) radial load 17,468 N, and (d) radial load 26,202 N 
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Table 7.4 Effect of radial load on the defect signal amplitude (/bpfii) 



Radial load (N) 



PSD 10"' (W/Hz) 



Percentage of increase 



4,367 
10,918 
17,468 
26,202 



0.42 
0.99 

1.77 
2.98 



135.7 
321.4 
609.5 




10 20 30 40 50 60 

Frequency (Hz) 

Axial load N 



10 20 30 40 50 60 

Frequency (Hz) 

Axial load 5,240 N 



70 



5 


xlO 














' 


' 




' 










■'BPFOl 


fsPFIl 

1 




2.5 










Jbpfoi 




- 


n 




i. 


1 




. 1 




i 



70 



5 


xlO 


















fsPFOI 


fsp 


FIl 




2.5 














- 






4 


. L 




■'BPF02 




k 



Q 

a. 



10 20 30 40 50 60 70 

Frequency (Hz) 

Axial load 15,721 N 

Fig. 7.11 Effect of axial load on defect feature amplitude, (a) Axial load N, (b) axial load 5,240 
N, and (c) axial load 15,721 N 



Table 7.5 


Effect of axial load 


on the defect feature amplitude (/bpfh) 




Axial load 




PSD 10"^ (W/Hz) 


Percentage of increase 




5,240 
15,721 




3.18 
3.33 
3.54 


4.7 
11.3 
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Fig. 7.12 Effect of bearing speed on defect amplitude, (a) Speed 300 rpm (at decomposition level 8) 
(b) speed 600 rpm (at decomposition level 7), and (c) Speed 1,200 rpm (at decomposition level 6) 



Table 7.6 Effect of speed on the defect signal amplitude (/bpfii) 




Shaft speed (rpm) PSD 10^ (W/Hz) 


Percentage of increase 


300 1.56 
600 2.03 
1,200 2.67 


30.1 
71.2 



Table 7.6. The speed-related defect feature amplitude increase can be explained by 
the fact that with the increase of the speed, the number of impacts per bearing 
revolution increases proportionally, hence the total amount of defect impact energy 
also increases, leading to increased peak amplitude at the defect characteristic 
frequency /bpfii. 
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7.4 Summary 

This chapter introduces a unified time-scale-frequency analysis technique based on 
the combination of discrete wavelet transform with frequency domain postproces- 
sing. The effectiveness of this technique in improving bearing defect diagnosis is 
then investigated. A localized defect of 0.25 mm in diameter at the inner raceway of 
a type 6220 bearing has been successfully detected, under various bearing operation 
conditions. It is shown that the Fourier- transform-based spectral analysis technique 
alone is not reliable to detect the transient components that are characteristic of 
localized bearing defect, whereas wavelet transform alone does not explicitly 
identify the specific location of the defect. Thus, the unified technique combines 
the advantages of both the time and frequency domain analyses and provides more 
information on the defect feature than each of the techniques employed individu- 
ally. In addition to bearing defect diagnosis, the new technique provides a powerful 
tool for the detection, extraction, and identification of weak "defect" features 
submerged in vibration signals in a wide range of manufacturing equipment 
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Chapter 8 

Wavelet Packet-Transform for Defect 

Severity Classification 



Once a defect is detected, the next question that comes up naturally is how severe the 
defect is. Since machine downtime is physically rooted in the progressive degrada- 
tion of defects within the machine's components, accurate assessment of the severity 
of defect is critically important in terms of providing input to adjusting the mainte- 
nance schedule and minimizing machine downtime. This chapter describes how 
wavelet packet transform (WPT)-based techniques can classify machine defect 
severity, with specific application to rolling bearings. 



8.1 Subband Feature Extraction 

Because of the complex nature of machines and the intricacy of related parameters, it 
is generally difficult to assess the status of a machine directly from the measured time 
domain data. The general practice is to extract "features" to identify characteristics 
and patterns embedded in the data series that are indicative of status changes of the 
machine being monitored. The advent of wavelet transform has provided an effective 
tool for feature extraction from various time-varying signals, such as washing 
machines (Goumas et al. 2002), rolling bearings (Mori et al. 1996; Prabhakar et al. 
2002), and machine tools (Lee and Tang 1999; Li et al. 2000a, 2000b). As an extension 
of the wavelet transform, WPT provides more flexible time-frequency decomposi- 
tion, especially in the high-frequency region, when compared with the wavelet 
transform. In particular, WPT allows for feature extraction (e.g., energy content or 
kurtosis value) from subfrequency bands of the decomposed signal where the features 
are concentrated, thereby directing the computation to where it is most needed. Prior 
efforts have studied different sets of wavelet packet vectors to represent bearing 
vibration under different defect conditions (Liu et al. 1997). Altmann and Mathew 
(2001) found that features extracted from wavelet packets that cover the multiple 
subfrequency bands yield a higher signal-to-noise (S/N) ratio than those from a 
conventional band-pass filter. For multistage gearbox vibration analysis, the Hilbert 
transform and WPT were combined to enable gear defect detection at the incipient 
stage (Fan and Zuo 2006). 
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Given a time-domain signal x(t), the WPT decomposes it into a number of 
subbands, as expressed by the resulting wavelet packet coefficients: 

x{t) = J2^^;{t) (8.1) 

11=0 

In (8.1), the term x"{t) denotes the wavelet coefficients at the j level, n subband. 
From these coefficients, "features" will be extracted, at each subband, to provide 
information on the condition of the machine being monitored. 



8.1.1 Energy Feature 

The energy content of a signal provides a quantitative measure for the signal, which 
uniquely characterizes the signal. The amount of energy contained in a signal x{t) is 
expressed as: 



E4,) 



\x{t)\^dt (8.2) 



The energy content of a signal can also be calculated from the coefficients of the 
signal's transform. In the case of a WPT, the coefficients x"{t) quantify the amount 
of energy associated with each of the subbands. The total amount of energy 
contained in the signal is equal to the sum of the energy in each subband and 
expressed as: 

2'- 1 j. 

«=o ■' 

Since the energy content of each subband of the signal is directly related to the 
severity of the defect, it presents an indicator or feature of the machine's condition. 
From (8.3), the energy feature in each subband is defined as: 



E'! 



\x"{t)\^dt (8.4) 



Similarly, when a signal is represented by discrete, sampled values x(i) (i — 1, 2, 
. . ., M), the total energy feature in the subbands is calculated as: 



Ej-E-n'f (8-5) 
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8.1.2 Kurtosis 
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Kurtosis is a dimensionless, statistical measure that characterizes the flatness of a 
signal's probability density function. An impulsive signal that is peaked has a larger 
kurtosis value than a signal that is flat and varies with time slowly, as illustrated in 
Fig. 8.1. 

Mathematically, the kurtosis of a signal is defined as its fourth-order moment: 



K^(i) = 



E[{4t) - Av(r)) ] 



(8.6) 



'.v(/) 



where n^t,-. and <J^^^,'^ denotes the mean value and standard deviation of the signal x 
(t), respectively. The symbol £[•] denotes the expectation operation. Table 8.1 lists 
the kurtosis values of several representative signals. It should be noted that the 
value of kurtosis does not depend on the amplitude of a signal. 

For the wavelet packet coefficients in each subband, the corresponding kurtosis 
value is defined as: 



£[(A-;'(r)-A^,»(o)' 

^n ■' ./ ^ ' 



(8.7) 



'm 



where the symbols /,(,»(,) and ci^ll|^,■^ are the mean and standard deviation of the 

wavelet packet coefficients x'Ht), respectively. 

When the wavelet packet coefficients are sampled as x^(/), the kurtosis value is 



calculated as: 






(8.8) 



where the symbols /,(,»(,) and a^nf^i-^ are the mean and standard deviation of the 

wavelet packet coefficients x'Hi), respectively. 

Since the energy content of a signal provides a robust indicator of the signal, 
but is not sensitive in characterizing incipient defects, whereas the kurtosis 
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Table 8.1 Kurtosis values of ^— J Kurtosis 

several typical signals 

Square signal 1.0 

Sinusoidal signal 1 .5 

Gaussian signal 3.0 

Pulse signal >3.0 



value has high sensitivity to incipient defects but has low stability (Yan and 
Gao 2004), these two features can be combined instead of being used alone to 
improve the signal characterization. Suppose we decompose a signal into j levels 
(e.g., j = 4), which generates 2' or 2^* = 16 subbands. Given that the energy 
and kurtosis values are calculated from each subband, there will be 2 x 2* or 
2x2 =32 features extracted from the signal. These features can be expressed 
in a feature vector as: 

FV=[E^,EJ,...,Ej''\ Kf,Kl,...,KfV (8-9) 

For simplicity, (8.9) can be expressed as: 

fV = {/,|/=l,2,...,/7}, p = 2^"' (8.10) 

where /i = £°, . • . J,, = Ef-\fv+, = ^°' ■ ■ • Jp = ^;"'- 

Determining which features shown in (8.10) are most effective for character- 
izing machine defect is generally not a simple, straightforward process because 
the usefulness of features may be affected by factors such as the specific location 
of the sensors, and consequently, the quality of the signal measured in terms of 
the S/N ratio or signal contamination. Furthermore, using more features may 
not necessarily improve the effectiveness of defect severity estimation, while 
increasing the computation cost (Malhi and Gao 2004). Since defect-induced 
signals are typically reflected in the variation of the characteristic frequencies 
(e.g., characteristic defect frequency shifts as the defect size grows), degradation 
of machine conditions is predominantly reflected in certain subfrequency bands, 
whereas other subfrequency bands contain information unrelated to the defect. 
This indicates that feature selection is needed for identifying significant features 
from the pool of WPT-based feature set. 



8.2 Key Feature Selection 

This section introduces two feature selection methods: Fisher linear discriminant 
(FLD) analysis and principal component analysis (PCA). The goal is to differentiate 
the signals (which represent different machine defect severity) by examining only 
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those subbands with distinct feature discrimination than others, thus improving the 
efficiency while not missing critical information related to the signal, to ensure 
reliability of the diagnostic operation. 



8.2.1 Fisher Linear Discriminant Analysis 

Distance measures, such as the Bhattacharyya distance, Kolmogorov distance, and 
FLD (Fukunaga 1990; Yen and Lin 2000; Duda et al. 2001), have been applied to 
components differentiation within a class pair. Generally, the greater the distance 
between two feature components within a class pair is, the easier it will be to 
separate them. Ilustrated in Fig. 8.2 are two feature components representing the 
two signals from a healthy and a defective bearing, respectively. The features in the 
right-hand section of the figure are easier to be separated than those in the left-hand 
section because the probability distributions of the two components do not overlap, 
because of their relatively smaller standard distributions and larger distance 
between the mean values, compared with the two features in the left section. The 
result is that this pair of features has a higher discriminant power than the other pair 
of features. 

The approach introduced here for efficient feature selection is to evaluate the 
discriminant power of each individual feature within a class pair. Features that have 
a low discriminant power are excluded from the data analysis process, as they 
contain little useful information. Such an approach can be realized by examining 
the rank order (Kittler 1975) of the feature vector FV = {//|/ = 1,2, . . . ,/?}, shown 
in (8.10), as: 



J{h)>J[f2)>--->J{fd)>--->J{fp) 



(8.11) 



where ,/(•) is a criterion function for evaluating the discriminant power of a specific 
feature. Here the utility of the FLD is introduced (Duda et al. 2000), where the 
criterion function for differentiating a class pair is given by: 



healthy damaged 




healthy damaged 




Feature with overlapping Feature without overlapping 

Note: /j^, /X2 '■ mean value, ctj, a2 : standard deviation 



Fig. 8.2 Feature discrimination based on the distance between constituent components, (a) Feature 
with overlapping, (b) Feature without overlapping. Note: 1^1,^2 ■ mean value, (Ti,f72 : standard 
deviation 
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JfXUJ) 



\t'i.f,-Hf>\ 



(8.12) 



'iji 



Jjl 



cPj f represent the mean values and the variances of 



The symbols /i,- ^ , /i r and CTJf,u-f 
the /th feature,//, and for the classes / andy, respectively. Since typically more than 
two defect severity levels need to be evaluated in a machine defect diagnosis 
system, a ^-class, p feature component problem with k{k — l)/2 class pairs is 
investigated for generality. The process for feature selection, based on the FLD 
analysis method, is illustrated in Fig. 8.3. 

Features (i.e., subband energy and kurtosis values) are first extracted by 
means of the WPT from the signals measured on the machine (e.g., a milling 
machine, a spindle), under various operating conditions (e.g., speed and load). 
The mean values and variances of each individual feature // corresponding 
to each machine status are then calculated, for each set of operation conditions 
(e.g., 1,200 rpm, 3.6 kN radial load). For each possible class pair 
{(/,y)|/ = 1,2, . . . ,^ — 1,/ = ; + l,i + 2, . . . ,^} formed from two different 
machine states (e.g., health vs. light defect), the discriminant power measure //,(',./) 
for each feature//, is calculated, using (8.12). Descending sorting /y;(/,j) yields: 




Sub-band feature// extraction 



^ 



Condition pair (ij) selection 



JJ 



p 


Discriminant power Jf (i,j) calculation 




^ 




Feature subset Fjj selection 




Y ^^^i^ext class^^^^ 



pair exist ? 



Final feature set iY;,,«/ selection 



Fig. 8.3 Flowchart of the 
FLD feature selection process 
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i/, (/,./) > Jf, {i,.l) >--->Jf, (',,/) >•••>//„ i'J) (8.13) 

The first group of d features that have the highest relative Jfi{i,j) values are chosen 
to form the feature subset F,^, for each class pair {ij) : 

F,,j = {f,\l = 1,2, . . . ,«'}, / = 1,2, . . . ,^ - 1; J = / + 1, / + 2, . . . ,^ (8.14) 

The final feature set is obtained by taking the union of each feature subset across all 
the class pairs as: 

^/"-' = 10 U ^'4 (8.15) 

This feature set is subsequently selected for the machine defect severity classification. 



8.2.2 Principal Component Analysis 

PCA, as a multivariate statistical technique, has been intensively studied and 
utilized as an effective tool for process monitoring (Kano et al. 2001), structural 
damage identification (De Boe and Golinval 2003), and machine health diagnosis 
(Baydar et al. 2001; He et al. 2008). This is due to its ability in dimension reduction 
and pattern classification. In general, the PCA technique seeks to determine a series 
of new variables, called the principal components, which indicates the maximal 
amount of variability in the data with a minimal loss of information (Jolliffe 1986), 
to best represent the data in a least square sense. 

Suppose there are m feature vectors FV,(i = 1,2,...,ot) extracted from m 
signals, respectively. A p x m feature matrix X can then be formulated as: 

X^[FVuFV2,...,FV,„] (8.16) 

where the symbol FV denotes a /j-dimensional feature vector as shown in (8.10). 
Depending on the decomposition level j of the WPT, the dimension of the feature 
vector is determined as p = 2-'+' . Correspondingly, a scatter matrix S is constructed 
from the feature matrix X as: 

S = E[{X ~ X){X ^ Xf] (8.17) 

Where X is the mean value of X, and £[.] is the statistical expectation operation 
(Duda et al. 2000). Performing singular value decomposition on the scatter matrix 
leads to: 
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S^AAA' 



(8.18) 



where Ais ap x p matrix whose columns are the orthonormal eigenvectors of the 
scatter matrix, and A^A — I p. The symbol A is a diagonal matrix whose diagonal 
elements li > /.2 > • • • > "^p are the eigenvalues of the scatter matrix. Since the 
eigenvector in matrix A with the highest eigenvalue (i.e., a.\ in the diagonal matrix A ) 
is the first principle component of the /7-dimensional feature vectors, it is better-suited 
than any other feature vectors as the representative feature that identifies the condition 
of the machine being monitored, for example, defect severity of a bearing. As a result, 
PCA ranks the order of eigenvectors by means of their respective eigenvalues, from the 
highest to the lowest. Such a ranking sequence reflects upon the order of significance of 
the corresponding components. By examining the accumulated variance (e.g., 90%) of 
the principle components, which is defined as: 



var = 




X 100% 



(8.19) 



a lower-dimensional feature vectors Y can be constructed as: 
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Fig. 8.4 Simulated data of (/i,/2,/3,/4) for developing a feature selection scheme 
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Y^Al^^X (8.20) 

where q<p, and Ap^^ is the first q columns of A. 

Given that the features transformed by the principal components are not directly 
connected to the physical nature of the defect, the eigenvectors in Apxq for the 
transformed features are only used as the basis for choosing the most significant 
features from the original p-dimensional feature vectors. This is explained by 
means of a numerical simulation. As illustrated in Fig. 8.4, four normalized feature 
vectors, /i,/2,/3, and/4, are constructed with each of them forming clusters around 
four distinct levels of magnitudes. A total of 100 samples are considered for each of 
the four features, hence each feature is a 100-by-l vector. The four features are 
simulated to have random variations from the same mean for each of the four 
clusters. This is similar in principle to the variation of a measured data feature for 
four different defect severities. Each of the four clusters for each feature contained 
25 data points. The four features become less clearly differentiated from/i to/4, as 
overlap between the clusters increases. 

It is evident that a suitable feature selection scheme should be able to rank/1,/2, 
/3, and/4 in the same order as shown in Fig. 8.4. To derive the principal components 
for the simulated data set, the four normalized features are collected in a 4-by-lOO 
matrix X: 

X = [fuf2j3,.Uf (8.21) 

The eigenvalues and the eigenvectors are calculated from the scatter matrix S. The 
matrix of eigenvectors can be represented as A = [fl,j], where / = 1 to 4, and/ = 1 to 4. 
The eigenvector 134 consists of four components from the fourth column of the matrix 
Aasa4 = [ai4 02,4 «3,4 04,4]. Similar arrangement applies to ai, 02. and 133 (i.e., 
ai = [ai,i 02,1 03,1 fl4,i], 02 = [ai,2 02,2 03,2 04,2], and 

013 = [fli,3 02.3 03,3 «4,3 ] ), respectively. The matrix A is a 4 x 4 square matrix 
because of the presence of the four features/1^4. The eigenvector corresponding to the 
eigenvalue with the largest magnitude is chosen. As shown in Table 8.2, one of the 
four eigenvalues of the data set is much larger than the other three, indicating that most 
of the variance is concentrated in one direction. Table 8.3 lists the component 
magnitudes for the eigenvector corresponding to the largest eigenvalue. Since this 
corresponds to 04, the feature that is responsible for the maximum variance in the data 
is thus identified. 

Subsequently, the magnitudes of the four components of 64 are examined. As 
shown in Table 8.3, |ai,4| > |a2,4| > 1^3,41 > |'24,4|- This result can be interpreted in 
terms of the directionality of the eigenvector (04) in the original feature space. If the 
unit vectors for the original feature space are represented as Mi, M2, M3, and M4 (where 
Ml = [1 0]^, M2 = [0 1 0]^, etc.), then a higher magnitude of 0,4 denotes 
the similarity in direction of the eigenvector 04 with m,, when compared with the 
other unit vectors forming the basis for the original feature space. For the simulated 
data, the component 014 has the largest magnitude, followed by 02,4^ '33,4. and 044. 
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Table 8.2 Eigenvalues for T 024T 

simulated data , „ ^^, 

>.3 1.318 

Xa 32.023 



Table 8.3 The fourth 
eigenvector component 
magnitudes for simulated 
data 



Component Magnitude 

ai.4 0.599 

(32,4 0.523 

(33,4 0.439 

^4.4 0.417 



Thus, the feature represented along Mi is the most sensitive, followed by those along 
U2, M3, and M4. As a result, the PCA-based scheme ranks the four features /1-/4 as 
desired and selects most representative features. 



8.3 Neural-Network Classifier 

Once a suitable feature set (e.g., 6) is chosen from the extracted features (e.g., 32), 
the machine defect severity levels can be evaluated by means of a status classifier. 
Neural network as a classifier has been applied to machine health diagnosis, for 
example, for classifying rotating machines with imbalance and rub faults (McCor- 
mick and Nandi 1997), bearing faults (Li et al. 2000), and tank reactor operation 
states (Maki and Loparo 1997). In general, a neural network consists of multiple 
layers of nodes or neurons, and each layer has a number of parallel nodes that are 
connected to all the nodes in the succeeding layer through different weights 
(Haykin 1994). Using a training algorithm, the weights are adjusted such that the 
neural network responds to the inputs with outputs corresponding to the severity of 
a structural defect at the output layer. Figure 8.5 illustrates the architecture of a 
feed-forward neural network. For the ith layer of links, the symbols iv*'', Z»*'', x'''\ 
and y'''"' represent a vector of weights between the layers, node biases of a layer, 
inputs of nodes at one layer, and output at the output layer, respectively. At the 
output layer, a linear neuron is used to produce an output to indicate the machine 
defect severity level. 

The weights of a multilayer feed-forward neural network are continuously 
updated, while it is trained with training data consisting of a set of machine defect 
feature input vectors (x) and known output (d). This is realized by minimizing the 
error between the computed output of the network and the known output in the 
training process. Consider n pairs of input and output training data { (Xp, dp)\,p — 1, 
2, . . . , « j . For the pth pair data {Xp,dp], the mean square error (MSB) of the network 
output yp is expressed as: 
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(8.22) 



m— 1 



where m is the number of nodes at the output layer. Assuming that each input vector 
corresponds to a single severity value, the value of m is 1 . For the entire training 
data set, the total error Err, i.e., the learning error, is expressed as: 



Err = J2ep^J2J2^'^P'"~ yp"''^^ 



(8.23) 



In the training process, the learning error is minimized through continuously 
updating the connection weights in its structure with certain learning rule. After 
training with the training data, the designed network with the resulting connection 
weights generalize the relationship between the input and output to correctly 
classify new input data. When input feature vectors associated with a defective 
measurement occur, for which the network is however not trained, the neural 
network will interpolate a defect severity by the location of the new input in the 
space spanned by the training data. 

There are several gradient-based teaming rules to minimize the learning error Err 
by changing the connection weight w of the multilayer feed-forward neural network. 
Different leaming rules differ in how they use the gradients to update the weights w in 
training. Steepest decent with fixed leaming rate is the traditional leaming rule of the 
neural network, in which the weight h'*^'*^^' between the ^h layer and {k + l)th layer is 
tuned for each epoch, along the gradient direction by an amount: 
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dE 



Aw**) = 



-77 ■ 



a>v(*') 



(8.24) 



where rj, referred to as the learning rate, is fixed between and 1. The learning rate 
affects the convergence speed and stability of the weights during learning. Gener- 
ally, the training error decreases slowly with a learning rate 77 close to 0. On the 
contrary, the error may oscillate and not converge if 77 is close to 1. To speed up the 
training process for the steepest descent method, (8.24) is modified by adding a 
momentum term (Haykin 1994), expressed as: 



BF 



(8.25) 



where Awprev is the previous adjust amount, and the momentum constant a is set 
between 0.1 and 1 in practice. The addition of this momentum term smoothes weight 
updating and tends to resist erratic weight changes (Haykin 1994). After the network is 
trained with representative data, it is able to evaluate the new measurement data and 
classify them according to the rules that it has learned through the training data set. 



8.4 Formulation of WPT-Based Defect Severity Classification 

Utilizing the advantages of the WFT in subband signal decomposition, this section 
introduces a WPT-based defect severity classification algorithm, with assistance from 
the neural-network classifier introduced in Sect. 8.3. It starts with a WPT on the 
measured signals. After statistical features are extracted from the wavelet coefficients 
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Fig. 8.6 Procedure of the WPT-based defect severity classification 
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in each subband, key feature selection process is performed to determine the most 
significant features from the feature set, which are subsequently used as input to a 
neural network-based classifier for defect severity classification. Figure 8.6 illustrates 
how the developed technique is realized. The left side of Fig. 8.6 depicts the training 
process of the hybrid technique in a manner of supervised learning (i.e., based on the 
available reference data, denoted as signal 1, ...,«). In addition to providing inputs 
for constructing the neural-network classifier model, the results from the feature 
selection process are used to guide the feature vector selection in the evaluation 
process. The right side of Fig. 8.6 describes an evaluation process of the WPT-based 
algorithm. An input signal is passed through the process of signal decomposition, 
feature extraction, and selection. Eventually, the corresponding defect severity level 
is determined through the neural network classifier. 



8.5 Case Studies 

The application of the above described wavelet packet-based machine defect 
severity classification algorithm is described through two case studies. 



8.5.1 Case Study I: Roller Bearing Defect Severity Evaluation 

The first case study is to evaluate the defect severity level of a set of roller bearings 
(N205 ECP) with and without seeded defects. Specifically, vibration data were 
measured from both a new, "healthy" bearing that served as a reference base and 
three defective bearings containing localized defects of different sizes: 

(a) one 0. 1 mm diameter hole in the outer raceway 

(b) one 0.5 mm diameter hole in the outer raceway 

(c) one 1 mm diameter hole in the outer raceway 

In Fig. 8.7, segments of vibration signals measured from the healthy and defective 
bearings are shown. 

To provide sufficient training and testing data sets to the neural-network classi- 
fier, a total of 240 vibration data sets were collected under a bearing rotating 
speed of 1,200 rpm and a radial load of 3,600 N. For each operation condition, 
60 data sets were collected. Each data set was first decomposed by the WPT. 
Analysis has shown that features extracted from a four-level decomposition 
provided adequate information to differentiate the four defect conditions from 
each other (Gao and Yan 2007). On the basis of the information collected in the 
16 sub frequency bands (since 2"* = 16 ), a feature vector was subsequently con- 
structed, which contained 32 feature elements (i.e., 16 subband energy values and 
16 subband kurtosis values). 
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Fig. 8.7 Vibration signals 
measured from roller bearings 
with different conditions, (a) 
Signal from a healthy bearing, 
(b) Signal from a defective 
bearing (0. 1 -mm diameter 
hole), (c) Signal from a 
defective bearing (0.5-mm 
diameter hole), (d) Signal 
from a defective bearing 
(1-mm diameter hole) 
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FLD analysis is then applied for feature selection. The means and variances of the 
feature element,//, are obtained for each of the four bearing conditions. Table 8.4 
summarizes the discriminant power of the extracted features for different condition 
pairs, based on the Fisher discriminant criterion. The first three key features within 
each condition pair, for example, E^^, E\^, and E^'^ for the condition pair (healthy, 
light defect), are selected, and the final feature set is obtained through a union 
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Table 8.4 Discriminant power of the extracted features for various condition pairs in different 
subbands 



Subband 
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operation from all the six condition pairs as listed in Table 8.5, where the energy 
features £4, E^, E^^, E^^, E'^^ , and £4^ are selected as the most representative 
features, because they possess higher discriminant power than others as listed in 
Table 8.4. 

Next, the PCA technique was performed on the feature vectors. As seen in Fig. 8.8, 
the first five principal components represent over 90% variance, which preserves most 
of the information contained in the original feature set (JoUiffe 1986). This is 
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Table 8.5 Final feature set obtained through a union operation by Fisher linear discriminant 

analysis 
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Fig. 8.8 Accumulated variance of principal components for the tested bearings 

considered sufficient for constructing the corresponding subspace principle 
component matrix (based on their corresponding eigenvectors) for choosing the 
features from the original feature vector. Table 8.6 Hsts the eigenvectors that corre- 
spondto the first five principal components. By searching for those components with 
the largest magnitude in each eigenvector, the corresponding energy values £4 and E^, 
kurtosis values K^, K^, and Kl^ are identified as the most representative features. 

The selected feature set was input to a multiplayer perception (MLP) neural 
network for bearing defect severity classification. Since different ratios (e.g., 
70-30 or 50-50) for the training and testing data were suggested for neural 
network-based classifier in the literature (Paya et al. 1997; Jack and Nandi 
2001), but no single fixed ratio has been preferred, two thirds of the data sets 
corresponding to each condition were used for training the classifier, and the 
remaining one third for performance checking, from a total of 240 collected data 
sets. This was aimed at providing sufficient training data to ensure accuracy of the 
classifier. The classification rates are listed in Table 8.7. When the FLD-selected 
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Table 8.6 The first five eigenvectors of the extracted features for the roller bearing 
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Table 8.7 Neural-network classifier results of the roller bearing 



Classification 


WPT features 


WPT features 


WPT features 


Raw data 


rate 


with FLD (%) 


with PCA (%) 
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features (%) 
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Overall 
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feature set was used as input to the MLP classifier, only 5% of measured data with the 
0.1 -mm hole in the bearing outer raceway is misclassified, out of the whole test data. 
This led to 98% overall classification success. When the PCA-selected feature set was 
used as the MLP input, a classification rate of 92% was identified, which is lower than 
the FLD-selected feature set. The rate, on the contrary, is still higher than the rates 
obtained using WPT features only as the input to MLP (88%) and using raw data 
features as the input (83%). This illustrates that the WPT-based feature extraction and 
selection method is effective in defect severity classification. 



8.5.2 Case Study II: Ball Bearing Defect Severity Evaluation 

For the second case study, a run-to-failure experiment was conducted on a deep 
groove ball bearing (type 1 lOOKR) of 52 mm outer diameter, under a radial load of 
5,498 N. The bearing contained a 0.27-mm wide groove across its outer raceway as 
an embedded defect, and was continually run under a rotational speed of 2,000 rpm. 
Upon reaching approximately 2.7 million revolutions, the defect has propagated 
throughout the entire raceway and rendered the bearing practically nonfunctional. 
This case study was designed to investigate the effect of continuous degradation of 
the defect, whereas case study I discussed above concerns with the effect of discrete 
defects. 

Vibration signals were collected during the experiment at an interval of every 
7 min. Figure 8.9 illustrates the trend of the vibration amplitude along the process of 
defect propagation. Three vibration signals are also shown in Fig. 8.10, and they are 
measured right after the bearing is physically examined at different test stages. For 
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Fig. 8.9 Amplitude as a function of the ball bearing revolution 
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Fig. 8.10 Accumulated variance of principal components for the ball bearing 1 lOOKR 



purpose of defect severity evaluation, all the collected vibration data sets are 
divided into three sections with the threshold values of the amplitude being set at 
0.6 and 0.9 V, respectively. As shown in Fig. 8.9, the three sections correspond to 
three different defect propagation phases during the run-to-failure test. It should be 
noted that, since no prior knowledge was available regarding the relationship 
between the vibration amplitude and the defect severity level, the choice of three 
phases investigated here is empirical. 

The vibration signals were first decomposed into 16 subbands. The energy and 
kurtosis features were then calculated from the wavelet packet coefficients in each 
subband to formulate the feature vectors. FLD analysis was then used for feature 
selection. The means and variances of the feature element,//, were obtained for each 
of the four bearing initial conditions. Table 8.8 summarizes the discriminant power 
of the extracted features for different phase pairs, based on the Fisher discriminant 
criterion. The first three key features within each phase pair were selected and the 
final feature set was obtained through a union operation among different phase pairs 
(i.e., phase I, phase II; phase I, phase III; and phase II, phase III). As listed in 
Table 8.9, the kurtosis features K^, K\, kI, K^, and K^'^ are selected as the 
most representative features. 

When the PCA was performed on the extracted subband feature vectors, the 
first two principal components shown in Fig. 8.10 represent over 90% variance, 
which was subsequently used as the reference features from the original feature 
set. Table 8.10 lists the eigenvectors corresponding to the first two principal 
components. The highest magnitude in the first eigenvector was found to be 
associated with the first component, and the highest magnitude in the second 
eigenvector was seen to be related to the third component. Accordingly, the 
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Table 8.8 Discriminant power of the extracted features for the ball bearing phase pair in various 
subbands 
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Table 8.10 The first two 
eigenvectors calculated the 
extracted features for the ball 
bearing 
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Table 8.11 Results of neural-network classification rate of results of the ball bearing 
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energy value at subbands 1 and 3 were (denoted as £4 and £4 ) identified as the 
most representative features. 

Following the same procedure as described in case study I, 2/3 of the data sets 
corresponding to each defect propagation phase are used for training the MLP 
classifier, and the remaining 1/3 data points are used for performance checking. As 
shown in Table 8.1 1, when the features selected from the FLD approach were used 
as input to the MLP classifier, the classification rate for each phase is found to be 
92%, 91%, and 94%, respectively. This led to the overall classification rate of 92%. 
In comparison, when features selected using PCA technique were used as input to 
the MLP, the classification rates were lower, 87%, 84%, and 88%, respectively. 
Furthermore, when feature set extracted from each subband and raw data was 
directly used as the MLP input, the classification rates dropped down to even 
lower values (e.g., 82% overall classification rate for WPT features only, and 
78% overall classification rate for raw data features). This indicates again the 
effectiveness of the presented approach for defect severity classification. 



8.6 Summary 

This chapter introduces a wavelet packet-based signal processing approach for 
machine defect severity classification. After the subband energy and kurtosis 
features are extracted from realistic vibration signals using the wavelet-packet 
coefficients, the most representative features are chosen using the Fisher discrimi- 
nant criterion and principal feature analysis, respectively. These features are used as 
inputs to the neural-network classifiers to evaluate the machine defect severity. The 
effectiveness of the approach has been experimentally verified through two case 
studies for rolling bearing defect severity classification. It is shown that the intro- 
duced approach provides a practical way for feature extraction and selection. 
In addition to bearing defect severity classification, this approach is applicable to 
classifying the working states of other machines and machine components, thus 
providing a useful tool for machine condition monitoring and diagnosis. 
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Chapter 9 

Local Discriminant Bases for Signal 

Classification 



The goal of analyzing signals from manufacturing machines is to extract relevant 
features from the waveforms to effectively characterize the working conditions of 
the machines (e.g., tool breakage and gear degradation). As we have shown in 
Chap. 5, the wavelet packet transform can lead to redundant signal decomposition 
within certain time-frequency subspaces. When performing wavelet packet trans- 
form, the time-frequency subspaces are collectively called the wavelet packet 
library. Each of the subspaces is denoted as a wavelet packet node. Such a way of 
signal decomposition provides the possibility of selecting a particularly suited set of 
wavelet packet nodes out of the wavelet packet library for a specific signal analysis 
task, such as data compression, regression, or classification (Saito 1994). However, 
the choice of wavelet packet nodes is dependent on the specific task. For example, 
the optimal wavelet packet transform technique introduced in Chap. 5 is geared 
toward signal compression (Coifman and Wicherhauser 1992), in which the wave- 
let packet nodes are selected based on minimizing an information cost function 
(e.g.. Shannon entropy). This chapter introduces how to choose a good set of 
wavelet packet nodes from a wavelet packet library, for purpose of signal classifi- 
cation. Such a technique has shown to be effective for monitoring and diagnosis of 
rotating machines. 



9.1 Dissimilarity Measures 

To classify signals obtained from a machine under different working status, the 
features extracted from the signals should clearly differentiate different working 
status of the machine, where each status is considered as a distinct class. For 
example, signals measured on a new gearbox are denoted as one class, while signals 
measured on a gearbox with broken-teeth are denoted as another class. Such type of 
features is referred to as "discriminant" features of the signal. The main objective of 
signal classification by using the wavelet packet transform is to find an optimal set 
of wavelet packet nodes (each node representing a wavelet packet basis) that are 
capable of discriminating different classes as effectively as possible. This can be 
achieved by decomposing the signal of interest into different classes, using the local 
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discriminant bases (LDB) algorithm (Saito 1994; Saito and Coifman 1995). The 
optimal choice of LDBs depends on the nature of the signals and the dissimilarity 
measures used to distinguish classes. In general, the dissimilarity measure is aimed 
at evaluating the "statistical distances" of each wavelet packet node among differ- 
ent classes. Numerous dissimilarity measures have been developed (Basseville 
1989; Saito 1994; Saito et al. 2002; Umapathy and Krishnan 2006; Umapathy 
et al. 2007), among which the following four measures have been typically asso- 
ciated with the application of the LDB algorithm. 



9.1.1 Relative Entropy 

Relative entropy is one of the first dissimilarity measures used for identifying the 
LDBs (Saito 1994). On the basis of the definition of relative entropy, this dissimi- 
larity measure is defined as: 

^i(p',/'') = E/'Mog4 (9.1) 

,=1 Pi 

where Y^^^^p] = 1 and Y^"^iPJ = 1. The symbols/?' and p^ denote nonnegative 
sequences, respectively. It is assumed that log == — oo, log(x/0) = +oo for 
x> 0, and x (±oo) = 0. The discriminant information Di(p^, p ) between these 
two sequences measures how differently p' and p^ are distributed. From the defini- 
tion, it is seen that the nonnegative sequences p' and p^ can be considered as 
probability density function. Since the normalized energy of wavelet coefficients 
(i.e., representation of energy distribution) within each wavelet packet basis is 
actually an expression of the probability density function associated with that wavelet 
packet node, it can be used as replacement in (9.1). Consequently, the relative 
entropy can be used as a dissimilarity measure in the LDB algorithm. Furthermore, 
(9.1) indicates that the relative entropy measure Di is nonnegative and will be zero if 
the two sequences of/?' and p'^ are the same. The more separate from each other the 
two sequences are, the higher the relative entropy measure Di will be. However, it 
should be noted that the relative entropy measure shown in (9.1) is only applicable to 
a two-class problem. For multiple-class problems (e.g., gearbox under four different 
conditions: such as (a) faultless, (b) slight-wom, (c) medium-worn, and (d) broken- 
teeth), the dissimilarity measure based on relative entropy is modified as: 

where L is the number of classes. Equation (9.2) indicates that the dissimilarity 
measure of multiple-class problems is the summation of relative entropy for each 
pair of two-classes among all the classes. 
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9.1.2 Energy Difference 

From the decomposition results of a signal's wavelet packet transform, the normal- 
ized energy associated with the wavelet packet node (/, k) is calculated as: 

v^M I i2 

^.^^l^MmL (9.3) 

In (9.3), the symbols 7 and k represent the wavelet packet decomposition level and 
subfrequency band, respectively. These two symbols, collectively, represent a 
wavelet packet node ( /, k). The symbol xy j./ denotes the /th wavelet packet coeffi- 
cient within the node (/, k), and the symbol M denotes the total number of 
coefficients within that node, fi^f,) is the total energy contained in the signal. 

The difference in the normalized energy associated with wavelet packet node 
(y, k) between the signals from two classes (denoted as class 1 and class 2) can be 
defined as a dissimilarity measure, given by: 

D2{E\E^)=El~E% (9.4) 

The symbols iik and E^^. represent the normalized energy associated with wavelet 
packet node (/, k) from class 1 and class 2 signals, respectively. Since each wavelet 
packet node corresponds to a time-frequency subspace, the normalized energy 
computed at a node provides the energy distribution of the signal in a particular 
subfrequency band. The greater the difference at a particular node in the energy 
distribution of the two classes, the more significant the node for discriminating the 
classes will be. Similar to the definition expressed by (9.1), (9.4) represents the 
energy difference measure used for a two-class problem. For multiple-class pro- 
blems, the dissimilarity measure based on the energy difference is expressed as: 

D2{{E"'t^,)=Y,Y.D2{E\E') (9.5) 

a=l h=a+\ 

where L is the number of classes. Equation (9.5) indicates the dissimilarity measure 
of multiple-class problems is the summation of energy difference for each pair of 
two-classes among all the classes. 



9.1.3 Correlation Index 

The dissimilarity measure can also be defined from the correlation between the 
wavelet packet node (/', k) from class 1 and class 2. This measure can be used to 
identify those nodes that can detect the difference in the temporal characteristics of 
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the signals between class 1 and class 2. The dissimilarity measure based on the 
correlation index, which is used in a two-class problem, is formulated as: 

D,{x\x') = {xI,j,xIj) (9.6) 

where the symbols /, k, and / represent decomposition level, subfrequency band, and 
time position, respectively, and xk, and xj/,/ are the coefficients of the 
corresponding wavelet packet node (/', k) of class 1 and class 2. An average low 
correlation index at a particular node indicates high dissimilarity between the 
classes. Similarly, for a multiple class problem, the dissimilarity measure based 
on correlation index is expressed as: 



/)3(K'}Li) = E E^3(^",A-'') (9.7) 

a=l h=a+l 

where L is the number of classes. Equation (9.7) indicates that the dissimilarity 
measure of multiple-class problems is the summation of correlation index for each 
pair of two-classes among all the classes. 



9.1.4 Nonstationarity 

Nonstationarity of the wavelet packet coefficients may also be used to measure 
the dissimilarity. It is computed as the set of variances along the segments of the 
wavelet packet coefficients at a given node (/, k). The ratio of this variance between 
class 1 and class 2 indicates the amount of deviation in the nonstationarity between 
the two classes. Consequently, the dissimilarity measure based on nonstationarity, 
which is used in a two-class problem, can be defined as: 

, . varfvkl 
D,{v\v^)^^^^ (9.8) 



var[v2^j 

where the symbols y and k represent the decomposition level and subfrequency band 
of the wavelet coefficients, respectively. The symbols v^ and v are variance 
vectors. Each of them contains L variances, obtained by equally segmenting the 
wavelet packet coefficients at node (/, k) for class 1 and class 2 signals, respectively. 
For example, given a signal in class 1 with 4,096 data points, there will be 1,024 
wavelet packet coefficients at node (2, 1), since 4,096/2 — 1,024. If these wavelet 
packet coefficients are equally partitioned into eight segments (i.e., L — 8), then 
there will be eight elements in variance vector v^. Each of the elements is calculated 
from 128 wavelet packet coefficients (1,024/8 = 128). Variance vector v^ can be 
obtained in the same way. 
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Similarly, for multiple class problems, the dissimilarity measure based on 
nonstationarity is expressed as: 

^4({v'"}Li) = E E^4K,v'') (9.9) 

where L is the number of classes. Equation (9.9) indicates that the dissimilarity 
measure of multiple class problems is the summation of nonstationarity for each 
pair of two-classes among all the classes. 



9.2 Local Discriminant Bases 

Utilizing one of the dissimilarity measures introduced earlier (e.g., relative 
entropy), the LDB algorithm can identify wavelet packet nodes that exhibit high 
discrimination, as indicated by a large statistical distance among the classes. 

Let us assume that Qqq denotes the wavelet packet node of the parent tree (i.e., 
the signal itself). Then at each level, the wavelet packet node f2,,^ is split into two 
mutually orthogonal subspaces (i.e., nodes Qj+i^2k and Qj+i,2k+\), given by 

%* = Qj+l,2k © %l,2<r+l (9.10) 

wherey indicates the level of the tree, and k represents the node index in level /, given 
by ^ = 0, . . ., 2^ — 1. This process is repeated until level J, giving rise to 2' mutually 
orthogonal subspaces. The goal is to select a set of best subspaces that provide 
maximum dissimilarity information among different classes of the signals. This can 
be realized by a pruning approach, where the wavelet packet tree is pruned in such a 
way that, starting from the bottom decomposition level, a node is split if the 
cumulative discriminative measure of the children nodes is greater than that of the 
parent node. In other words, a node is split if the children nodes have better 
discriminative power than that of the parent node. Such a process is executed until 
it reaches the top level of the decomposition. As a result, the process will end with a 
subset of wavelet packet nodes that contribute to maximizing the statistical distance 
among different classes. As an example. Fig. 9.1 shows a wavelet packet tree for a 
two-level signal decomposition. The LDB algorithm first compares the discriminant 
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Fig. 9.1 Illustration of all 
nodes in two-level wavelet 
packet decomposition 
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power associated with the coefficients of training signals in different classes at 
the Qi I node with that of the 02,2 and 02,3 nodes, respectively. If the relative 
entropy of fli 1 is larger than that of 02,2 and ^2,3, it keeps the bases belonging to 
node Qii and omits the other two nodes (f32,2 and ^223). Otherwise it keeps the 
two nodes {Q22 and ^223) and disregards the basis of node Q]i. This process is 
applied to all the nodes in a sequential manner, up to the scale j = 0. As a result, a 
set of complete orthogonal wavelet packet bases having the highest discriminant 
power are obtained, which can be sorted out further for classification, according 
to a decreasing order. 

Suppose that A,,/^^ represents the desired local discriminant base restricted to the 
span ofBj,!^, which is a set of wavelet packet coefficients at (/, k) node, and zly,^ is the 
array containing the discriminant measure of the same node, then the LDB algo- 
rithm for selecting the optimal wavelet packet base can be summarized as follows 
(Tafreshi et al. 2005): 

LDB Algorithm Given a training dataset that consists of L class of signals 

{{x] },ii},^[ with Ni being the total number of training signals in class /, 

Step 0: Choose a time-frequency analysis method, such as the wavelet packet 

transform, to decompose the signals in the training dataset. 
Step 1 : Select a dissimilarity measure (e.g. , relative entropy D 1 ( {p"' }^^^ 1 ) ) to apply 

on the wavelet packet coefficients to the corresponding nodes (/, k) of the 

wavelet packet trees. 
Step 2: Set A/,^ = B/,^, where B,,i^ is the basis set spanning subspace of Qy,^ node 

(/, k), and then evaluate zl/,^ for ^ = 0, . . ., 2' - 1. 
Step 3 : Determine the best subspace Ay,^ forj = / - 1 , . . . , 0, ^ = 0, . . . , 2' - 1 by the 

following rule: 

Set Aj,i^ as the dissimilarity measure, e.g., Aj,i^ — Di{{p'"}^j^\) 

If ^yjit > ^j+i,2k + ^7+i,2/t+i5 i-e-5 if the discriminant power of a parent node 

in wavelet packet tree is greater than those of children nodes. 
Then 

Else 

Aj,i, =: Zl;+i,2A- 4+1.2't+l and set Zl,-,^- = Zl,+ i,2«: + /l;+i,2i-+l. 

Step 4: Order sort the chosen basis functions by their power of discrimination in a 

decreasing order. 
Step 5: Select the first k{<l) highest discriminant base functions. 

After step 3 is performed, a complete orthogonal basis is constructed. Orthogo- 
nality of the bases ensures that wavelet coefficients used as features during classifi- 
cation process are uncorrelated as much as possible. Subsequently, one can simply 
choose the first k highest discriminant bases in step 5 and use the corresponding 
coefficients as features in a classifier, or employ a statistical method, such as 
Fisher's criteria, to reduce the dimensionality of the problem first and then apply 
them into a classifier. 
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9.3 Case Study 

To evaluate the effectiveness of the wavelet packet bases constructed using the 
LDB algorithm, three classes of signals are synthetically formed: 



x''' (f) = Sine(r) + «i (?) for class 1 

x^^'{t) = Gauspuls(;) + 112(1) for class 2 
x^^\t) = Tripuls{t) + «3(f) for class 3 



(9.11) 



In (9.11), Sine(r), Gauspuls(r), and Tripuls(?) represent the sinusoidal, Gaussian- 
modulated sinusoidal pulse, and triangle wave signals, respectively. The terms 
«i(0, n2{t), and nT,{t) represent white noise. For each class, 100 training signals 
and 1 ,000 test signals were constructed, and the white noise was regenerated each 
time. Figure 9.2 shows one sample signal with 64 sampling points from each class. 
Each sample signal can be decomposed up to the six level (i.e., 2^ — 64), and the 
total number of nodes contained in the wavelet packet library for the signal is 127 
(i.e., 1 for the level, 2 for the first level, . . ., 64 for the sixth level). 

The LDB algorithm is first applied to the training signals to select a subset of 
wavelet packet nodes from a wavelet packet library that best discriminate the three 
classes. Figure 9.3 shows the selected wavelet packet nodes. It can be seen that the 
selected wavelet packet nodes (highlighted in black color) are distributed across 
different decomposition levels. Altogether, they form complete orthogonal bases. 
The first six selected LDB bases are shown in Fig. 9.4, with each containing 
64 coefficients. It should be noted that these bases are sorted according to their 
discriminant power. A complete discrimiant power for all the 64 LDB bases is 
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Fig. 9.2 Sample waveform from (a) class 1, (b) class 2, and (c) class 3 
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Fig. 9.3 The wavelet packet nodes selected by the LDB algorithm 
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Fig. 9.4 The first six LDB bases selected from the signals 



shown in Fig. 9.5 with a decreasing order. A rapid decrease of the discriminant 
power relative to the LDB bases is seen after the first few bases. Therefore, only the 
first few bases (e.g., the first six bases) with large discriminative power are 
considered for purpose of classification. 

Wavelet coefficients constructed by projecting the signals onto the selected 
bases are then used to form feature variables for classification. For the training 
data set, two features (i.e., two wavelet coefficients) generated by the top two LDB 
bases have produced the clustering result as shown in Fig. 9.6. 

To classify the three synthetic signals introduced above, these two features are 
used as input to a classifier. Various classifiers, such as linear discriminant analysis 
(LDA), neural network (NN), and support vector machine (SVM) can be considered 
for this purpose. In this example, the LDA classifier is selected due to its simplicity 
(Duda et al. 2000). Taking the two features as inputs to the LDA classifier, the 



9.3 Case Study 



157 




30 40 

LDB Base 



Fig. 9.5 Discriminant power of all 64 LDB bases 
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Fig. 9.6 Training signals represented by selected top two LDB features 



training dataset are classified. The test signals are subsequently projected onto the 
top two LDB vectors to produce coefficients. Figure 9.7 indicates the scatter plot 
of the testing dataset by the two features. It is seen that, using the LDA classifier, all 
the testing data set are classified successfully. 



158 



9 Local Discriminant Bases for Signal Classification 



(U 1 

Ph -2 



-3 

-4 

-5 









- • 




^P' 


_ 


^ 


^ ^ ■ 


- 


+ 


1 


° Class 1 
V Class 2 


+ Class 3 







-4 -3-2-10 1 2 

Feature 1 

Fig. 9.7 Testing signals represented by selected top two LDB features 

The above case study illustrates that the LDB-based wavelet packet base selec- 
tion method is effective in producing features that enables effective discrimination 
of different classes. 



9.4 Application to Gearbox Defect Classification 



Application of the LDB algorithm to real-world classification problems has been 
reported in various areas, such as geophysical acoustic waveform classification 
(Saito and Coifman 1997), radar signal classification (Guglielmi 1997), automatic 
target recognition (Spooner 2001), ultrasonic echoes classification (Christian 
2002), audio signal classification (Umapathy et al. 2007), biomedical signal analy- 
sis (Englehart et al. 2001; Umapathy and Krishnan 2006), and fault classification of 
mechanical systems (Tafreshi et al. 2005; Yen and Leong 2006). In this chapter, the 
LDB algorithm is applied to classifying the severity of defects in gearbox. Figure 
9.8 illustrates the experimental set-up (Rafiee et al. 2007) where the vibration from 
a four-speed motorcycle gearbox is measured. An electrical motor drives the 
gearbox at a constant nominal rotational speed of 1,420 rpm. A tachometer mea- 
sures the actual rotational speed to account for fluctuations caused by the load 
variations. Vibration signals are measured by a triaxial accelerometer mounted on 
the outer surface of the gearbox's housing, close to the input shaft of the gearbox. 
Four different working conditions of the test gear, including faultless, slight-worn, 
medium-worn, and with broken-teeth, are examined by analyzing the measured 
vibration data. The signals are sampled at 16,384 Hz. The sampled signals under the 
four different working conditions are shown in Fig. 9.9. 
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Fig. 9.8 Experimental setup to test a four-speed motorcycle gearbox 
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Fig. 9.9 Raw vibration signals of four gearbox conditions: (a) faultless, (b) slight-worn, (c) medium- 
worn, and (d) broken-teeth 
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To classify the gearbox defect under different working conditions, 60 training 
signals and 80 testing signals, each containing 1,024 data points, are segmented 
from the raw signal corresponding to each defect condition. The LDB algorithm is 
then applied to the training data, where the relative entropy was chosen as the 
discriminant measure. Figure 9.10 shows the wavelet packet nodes selected from a 
four-level signal decomposition, and the first 6 LDB bases are shown in Fig. 9.11 

To evaluate the effectiveness of the LDB algorithm, the performance of classifi- 
cation by using the testing data are compared between the LDB-selected nodes and 
all the 16 nodes at the 4th decomposition level. The energy values from the selected 
nodes are calculated and then used as inputs to a LDA classifier for characterizing 
the severity of the defect. Figures 9.12 and 9.13 illustrate the distribution of two 
energy features from node (3, 4) and node (4, 15) for the training and test data, 
respectively. The results of classification of gearbox defect severity are listed in 
Table 9. 1 . It is seen that, for the training data, although the misclassification rate for 
features with and without basis selection is the same, the LDB-selected features 
have resulted in a lower misclassification rate (1.56%) than those without (2.19%), 
for the testing data. This indicates the merit of the LDB in signal classification. 

Level 
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Fig. 9.10 Selected wavelet packet nodes for the gearbox data by the LDB algorithm 
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Fig. 9.11 The first six LDB 
bases selected from the 
gearbox vibration signals 
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Fig. 9.12 Illustration of training samples by two features 



xlO^ 



X 10-" 




2 3 

Energy of node (3,4) 



xlO* 



Fig. 9.13 Illustration of testing samples by two features 
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Table 9.1 Effect of wavelet packet node selection on gearbox defect severity classification 





Misclassification rate 




Features 


Training signals (%) 


Testing signals (%) 


Wavelet packet nodes w/o LDB 
Wavelet packet nodes w/LDB 


0.83 
0.83 


2.19 
1.56 



9.5 Summary 

The LDB provides an effective platfomi for the wavelet packet transform to 
decompose and classify signals. In this chapter, we have demonstrated that, using 
this approach, working conditions of a gearbox can be successfully classified by 
analyzing the measured vibration signals. Research on the theory of LDB has been 
continued in recent years. For example, the probability density of each class is 
estimated from the wavelet packet nodes to select the discriminant bases (Saito 
et al. 2002), and the features derived from this approach has been shown to be more 
sensitive to phase shifts than those from the original LDBs. Combing the LDB 
algorithm with signal-adapted filter banks (Strauss et al. 2003), a shape-adapted 
LDB approach has been developed for bio-signal processing. It can be expected that 
more powerful algorithms and computational tools are yet to come to better serve 
the need for signal classification in manufacturing. 
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Chapter 10 

Selection of Base Wavelet 



One of the advantages of wavelet transform for signal analysis is the abundance of 
the base wavelets developed over the past decades - there are a total of 1 3 wavelet 
families documented in the MATLAB library. From such abundance arises a 
natural question of how to choose a base wavelet that is best suited for analyzing 
a specific signal. The question is valid, since the choice in the first place may affect 
the result of wavelet transform at the end. As an example. Fig. 11.1 (top row, left) 
illustrates an impulsive signal and how it may appear as a time series (top row, 
right) in real-world applications. The three rows below illustrate three representa- 
tive base wavelets and the results of using them to analyze the impulsive signal: (1) 
Daubechies wavelet (Daubechies 1992), (2) Morlet wavelet, and (3) Mexican hat 
wavelet. These base wavelets have been used for machine condition monitoring and 
health diagnosis studies, as reported in Shao and Nezu (2004), Li et al. (2000), and 
Abu-Mahfouz (2005). Comparing the wavelet transform results using these wave- 
lets (shown in the right column. Fig. 10.1), it is apparent that only the Morlet 
wavelet is effective in extracting the impulsive component from the signal, as 
illustrated by the similarity in the waveform between the corresponding wavelet 
coefficient and the impulsive component. The Daubechies and Mexican-hat wave- 
lets, in comparison, did not fully reveal the characteristics of impulsive component. 
Such an example motives the study of base wavelet selection to achieve optimal 
result in feature extraction from a signal. In this chapter, we first present a general 
strategy for base wavelet selection, from both a qualitative and a quantitative 
aspect. Subsequently, we introduce several quantitative measures that can be used 
as guidelines for wavelet selection, to guarantee effective extraction of signal 
features. 



10.1 Overview of Base Wavelet Selection 

The topic of base wavelet selection has been addressed by researchers from 
different aspects. These prior approaches can be categorized as either qualitative 
or quantitative, and are reviewed in the following two sections. 
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Impulsive component A series of impulsive components 

Daubechies wavelet Coefficients extracted by Daubechies wavelet 




Morlet wavelet Coefficients extracted by Morlet wavelet 
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Mexican-hat wavelet Coefficients extracted by Mexican-hat wavelet 

Fig. 10.1 Impulsive feature extraction using different base wavelets 

10.1.1 Qualitative Measure 

Base wavelets are characterized by a number of properties, such as orthogonality, 
symmetry, and compact support. Understanding these properties will be helpful for 
choosing a candidate base wavelet from the wavelet families for analyzing a 
specific signal. For example, the orthogonality property indicates that the inner 
product of the base wavelet is unity with itself, and zero with other scaled and 
shifted wavelets. As a result, using an orthogonal wavelet will result in efficient 
signal decomposition into nonoverlapping subfrequency bands. High computa- 
tional efficiency can be achieved when orthogonal wavelets are used for imple- 
menting the discrete wavelet transform (DWT, see Chap. 4 for details) and wavelet 
packet transform (WPT, refer to Chap. 5). The symmetric property ensures that a 
base wavelet can serve as a linear phase filter. This is an important property in 
filtering operations, as the absence of it can lead to phase distortion. A compact 
support wavelet is one whose basis function is nonzero only within a finite interval. 
This allows the wavelet transform to efficiently represent signals that have localized 
features. The efficiency of such representation is important for data compression. 
In recent years, the basic properties of wavelets have been extensively investi- 
gated to determine the suitability of a wavelet for specific applications. For example, 
based on the experiments conducted on a total of 23 Brodatz textures (Mojsilovic 
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et al. 2000), it was concluded that the Biorthogonal wavelets with symmetry 
property enabled higher texture classification rate than the Daubechies wavelets, 
which is asymmetrical (e.g., 64.34% for Db3 vs. 82. 17% for Bior3.3r). Similarly, the 
symmetric property of five wavelets (i.e., Haar, Db6, Coif4, Bior5.5, and Bior6.8) 
were reviewed (Fu et al. 2003), from which the Bior6.8 wavelet was chosen as the 
best-suited wavelet to separate the roughness, waviness, and geometrical form of an 
engineering surface into different frequency bands for both functional correlation 
and process diagnosis in manufacturing. In the area of biomedical engineering, the 
regularity and symmetij of base wavelets were considered as essential features for 
auditory-evoked potentials (AEP) signal analysis (Bradley and Wilson 2004). The 
morphology and latency of peaks, which characterize the AEP signal, were pre- 
served when using a symmetric base wavelet, and the smooth peaks contained in the 
AEP signal were well matched when regularity of a base wavelet is greater than two. 
By taking into account the properties of compact support, vanishing moment, and 
orthogonality, the Coiflet 4 wavelet was selected to effectively separate burst and 
tonic components in the compound surface electromyogram (EMG) signals 
recorded from patients with dystonia (Wang et al. 2004). In addition to orthogonal- 
ity, the property of complex or real basis was used to guide the choice of the base 
wavelet for electrocardiogram (ECG) signal analysis (Bhatia et al. 2006). The 
Morlet wavelet, Gaussian wavelet, Paul wavelet at order 4, and quadratic B-Spline 
wavelet were preselected as the candidates for ECG events detection and segmenta- 
tion. In the area of image processing, the properties of regularity, compact support, 
symmetry, orthogonality, and explicit expression were used for recommending base 
wavelet for image sequence superresolution (Ahuja et al. 2005). It was concluded 
that the B-Spline family represents the most suitable base wavelet among the four 
candidates (i.e., Daubechies, Symlet, Coiflet, and B-Spline wavelets) for image 
sequence superresolution, as it is orthogonal, symmetric, and has the highest regu- 
larity, smallest support size, and explicit expression. In analyzing power system 
transients (Safavian et al. 2005), the Db4, Coiflet, and B-Spline wavelet were shown 
to be equally well-performing for the transient detection in a power system, as they 
share the same basic properties: finite support size and low vanishing moment. 

Shape matching has been studied as an alternative approach to wavelet selec- 
tion. For example, to measure the timing of multiunit bursts in surface EMGs 
from single trials (Flanders 2002), wavelets of different shapes, such as square, 
triangular, Gaussian and Mexican Hat, were investigated. The Db2 wavelet was 
chosen for its similarity to the shape of motor unit potentials hidden in the EMG 
signal. Also, base wavelets of different shapes were compared with ECG signals 
to determine their appropriateness for extracting a reference base from corrupted 
ECG, for magnetic resonance imaging (MRI) sequence triggering (Fokapu et al. 
2005; Abi-Abdallah et al. 2006). To analyze impulses in vibration signals, 
researchers looked at the geometric shape of wavelets to determine the optimal 
choice (Yang and Ren 2004). It was found that components in a signal may be 
extracted effectively when a base wavelet with similar shape as the component is 
employed. 
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10.1.2 Quantitative Measure 

The various approaches described in Sect. 10.1.1 illustrate the importance of 
choosing an appropriate base wavelet for effective signal processing. However, 
the basis properties of a wavelet only qualitatively determine its suitability for a 
particular application. As far as shape matching is concerned, it is generally 
difficult to accurately match the shape of a signal to that of a base wavelet through 
a visual comparison. These deficiencies motivate the study of quantitative measures 
for base wavelet selection. 

The measures of inequality (Goel and Vidakovic 1995), which includes the Schur 
concave functions such as Shannon entropy and Fishlow's measure (Marshall and 
Olkin 1979), Emelen's modified entropy measure (Emlen 1973), and Schur convex 
functions (Gini's coefficient and Schutz's coefficient by Marshall and Olkin 1979) 
were proposed for wavelet selection in data compression and data denoising. A time- 
series, which was constructed by adding white noise into the sampled Db3 wavelet 
function, was used to evaluate each of the inequality measures. All of them, except 
for the Fishlow's measure, recognized the Db3 wavelet as the best base wavelet 
among a set of wavelets (Dbl-Db20, Db30, Coif8, Coifl2, and Coifl8). 

The Shannon entropy was also utilized to identify optimal base wavelet for 
velocity and temperature time series analysis in atmospheric surface layer (ASL) 
(Katul and Vidakovic 1996). The large scale eddy motion and small scale fluctua- 
tions in the ASL were successfully separated with the chosen Daubechies wavelet. 
In another study (Bedekar et al. 2005), the Shannon entropy was used to choose the 
Daubechies wavelet of order 3 from 23 preselected wavelets as the optimal wavelet 
for radio-frequency intravascular ultrasound (IVUS) data decomposition. It accu- 
rately decomposed 29 out of 30 IVUS data at all levels. The rest of the wavelets 
only decomposed less than 21 IVUS data. 

In the field of biomedical engineering, study on horse gait classification has 
discussed an uncertainty model for wavelet selection (Arafat et al. 2003). The 
model combines ihtffizzy uncertainty with the probabilistic uncertainty to provide 
a better measure, when compared with using either fuzzy or probabilistic uncer- 
tainty alone, for choosing an appropriate base wavelet to improve correct classifi- 
cation of different horse gait signals. 

Study on assessing hypnotic state of anesthetized patients undergoing 
surgery (Bibian et al. 2001) has used the discrimination power, which is defined 
as the difference between the statistical features, such as probability density 
function, in the awake as well as in the anesthetized states, to select the appropri- 
ate base wavelet. It was found that among the Daubechies, Coiflet, Symlet, 
biorthogonal, and reverse biorthogonal wavelets, the Daubechies wavelet at 
order 8 provided the highest discrimination power, thus effectively estimated 
the hypnotic state. In another study on diagnosing cardiovascular ailments in 
patients (Singh and Tiwari 2006), experimental results have revealed the suitabil- 
ity of the Daubechies base wavelet at order 8 for the ECG signal denoising, as it 
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has the maximum cross correlation coefficient between the ECG signal and 
the chosen base wavelets, (Daubechies, Symlet, and Coiflet wavelets). The 
cross correlation measure has also been used in evaluating a base wavelet for 
detecting and locating the partial discharge (PD) occurred in operational trans- 
formers (Ma et al. 2002a, b; Yang et al. 2004). It was found that an optimal base 
wavelet would maximize the correlation coefficient between the signal of interest 
and the base wavelet, resulting in PD pulses being successfully separated from 
electrical noise. 

For image denoising, two criteria, the signal information extraction criterion and 
the distribution error criterion, were proposed to select an optimal wavelet 
for improving the denoising performance (Zhang et al. 2005). The first criterion 
was implemented by calculating the mutual information of wavelet coefficients of 
the image without noise contamination and those of the image with noise contami- 
nation. The second criterion was the difference between the Gaussian and the actual 
distribution of the wavelet coefficients of the image without noise contamination. 
It was reasoned that the smaller the difference is the better the denoising perfor- 
mance will be, as the denoising performance is optimal only if the underlying signal 
distribution is Gaussian. Using these two criteria, it was found that the Biorl.3 
wavelet provided the best performance among the eight wavelets (Biorl . 1 , Biorl .3, 
Bior2.2, Bior2.4, Bior3.3, Db2, Db3, and Db4) investigated when testing four 
benchmark images. 

For automatic ultrasound nondestructive foreign body (FB) detection and 
classification in nonflat surface containers (Tsui and Basir 2006), the relative 
entropy was employed as a wavelet coefficient similarity measure to select the 
best base wavelet. The results have shown that the best base wavelet for FB shape 
classification is Bior3.1, while Haar (or Syml or reverse Biorl. 1) and reverse 
Bior3.9 are the best for spherical and rectangular FB material classifications, 
respectively. 

For analysis of impulses in vibration signals (Schukin et al. 2004), the minimum 
total error and time-frequency resolution were devised to evaluate different base 
wavelets on impulsive parameter identification of a single-degree-of-freedom sys- 
tem model. A comprehensive comparison among ten base wavelets (complex 
B-Spline, Gaussian, Shannon, etc.) indicated that the impulse wavelet is the most 
appropriate base wavelet for the analysis of impulses. 



10.2 Wavelet Selection Criteria 

The importance of the base wavelet has been addressed by various researchers, as 
summarized earlier. This section introduces several quantitative measures in eval- 
uating the performance of base wavelets for the specific application domain of 
condition monitoring and health diagnosis in manufacturing. 
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10.2.1 Energy and Shannon Entropy 

The energy content of a signal is a measure that uniquely characterizes the signal, 
thus can be used for base wavelet selection. The amount of energy contained in a 
signal x{t) is expressed as: 



E, 



if) 



\x{t)\^dt (10.1) 



Similarly, when the signal is represented by discrete sample values 
x{i){i — 1,2, .. . ,N), the amount of energy is given by: 



%) = EW')i' (10.2) 



In (10.2), A^ is the length of the signal expressed by the number of data points, and 
x(i) is the amplitude of the signal. 

The energy content of a signal can also be calculated from its wavelet coeffi- 
cients, and is expressed as: 



p 

^energy 



\wt{s,T)fdsdz (10.3) 



The corresponding sampled version is given by: 

s i 

Equations (10.3) and (10.4) indicate that the energy associated with each particular 
scaling parameter s is expressed as: 

Eenergy{s) = \wt{s,x)fdx (10.5) 

And the energy of the corresponding sampled version is described as: 

N 
Eenergyis) = ^ \wt{s, /) ^ (10.6) 



where A' is the number of wavelet coefficients and wt(s, i) represents the wavelet 
coefficients. 

If a major frequency component corresponding to a particular scale 5 exists in the 
signal, then the wavelet coefficients at that scale will have relatively high 
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magnitudes at the time when this major frequency component occurs. As a resuh, 
the energy related to such frequency component will be extracted from the signal 
when applying the wavelet transform to the signal. For purpose of condition 
monitoring and heahh diagnosis, the higher the energy content extracted from the 
defect-induced transient vibrations is the more effective the wavelet transform of 
the signal will be. Therefore, the energy content can serve as a criterion for 
selecting the base wavelet. This is formulated in the following criterion. 

1 . Maximum energy criterion: The base wavelet that extracts the largest amount of 
energy from the signal being analyzed represents the most appropriate wavelet 
for extracting features from defect-induced transient vibrations. 

Given that for the same amount of energy within a subfrequency band, the specific 
condition of the signal may be significantly different (e.g., only several frequency 
components with high magnitude and others with negligible magnitude vs. a 
widespread spectrum), the spectral distribution (or concentration) of the energy 
needs to be considered also to ensure effective feature extraction. The energy 
distribution of the wavelet coefficients is quantitatively described by the Shannon 
entropy (Cover and Thomas 1991): 

N 

Ee,nropy{s) = ~'^Pi ' log2 Pi (10.7) 

1=1 

where p-, is the energy probability distribution of the wavelet coefficients, defined as: 

^' " E Is) ^ ^ ^ 

^ energy v^ } 

with Y!i=iPi = 1, and /?, • log, /?, = if/?, = 0. 

Equations (10.7) and (10.8) indicate that the entropy of the wavelet coefficients 
is bounded by: 

< Een,ropy{s) < log2 A^ (10.9) 

in which Ee„tropv{s) will be equal to (1) zero, if all other wavelet coefficients are equal 
to zero except for one wavelet coefficient, and (2) log2 A', if the probability of energy 
distribution for all the wavelet coefficients are the same (i.e., \IN). This leads to the 
conclusion that the lower the entropy value is, the higher the energy concentration will 
be. Therefore, an appropriate base wavelet should yield large magnitude at a few 
wavelet coefficients and negligible magnitude at others when the signal is decomposed 
into various scales, leading to the minimum Shannon entropy. The corresponding 
Shannon entropy-based wavelet selection criterion can thus be designed as: 

2. Minimum Shannon entropy criterion: The base wavelet that minimizes the 
entropy of the wavelet coefficients represents the most appropriate wavelet for 
defect-induced transient vibration analysis. 
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Combining the strengths of the two criteria described earlier, we note that an 
appropriate base wavelet should extract the maximum amount of energy from the 
signal being analyzed, while minimizing the Shannon entropy of the corresponding 
wavelet coefficients. This lead to the energy-to-Shannon entropy ratio, which is 
defined as: 

R{s)^IP^^ (10.10) 

In (10.10), the energy Ee„e,-gv{s) and the entropy £entropy('5) are calculated from 
(10.6) and (10.7), respectively. By maximizing the energy-to-Shannon entropy ratio 
R{s), an appropriate base wavelet can be selected from a set of candidate base 
wavelets. This leads to the following criterion for wavelet selection: 

3. Energy-to-Shannon entropy ratio measure: The base wavelet that has produced 
the maximum energy-to-Shannon entropy ratio should be chosen as the most 
appropriate wavelet for defect-induced transient vibration signal analysis. 



10.2.2 Information Theoretic Measure 

The energy and Shannon entropy-related criteria are solely based on the content of 
the wavelet coefficients themselves. Since the coefficients of a signal's wavelet 
transformation are inherently related to the signal, information theoretic measures, 
which describes the relationship between a pair of data sequence, can be explored for 
best suited base wavelet selection. These are introduced in the following sections. 



10.2.2.1 Joint Entropy 

The joint entropy H{X, Y) between two data sequences X and Y is defined to 
measure information associated with them as a whole (Cover and Thomas 1991). 
This is expressed as 



//(X,F) = -^^/,(.t,).)log/,(x,j) (10.11) 

xex yeY 

where p{x, y) is the joint probability distribution of the two data sequences. 

10.2.2.2 Conditional Entropy 

With probability distribution of the data sequence X known, the amount of infor- 
mation contained in the other data sequence Y can be measured by the condition 
entropy H{Y\X) as (Cover and Thomas 1991): 
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//(F|X) = -^p(.v)//(y|X = x) 

= - z^/'W z^piyl^) log piy\x) 

xex yeY 

In (10.12), p(x) is the probability distribution of the data sequence X, and p{y\x) 
denotes the conditional probability distribution of the data sequence Y when the 
data sequence X is known. The conditional probability distribution p{y\x) is 
expressed as (Mendenhall and Sincich 1995): 

p(jW=^ (10.13) 

p{x) 

with/j(jr, y) being the joint probability distribution of the two data sequence X and Y. 
As a result, (10.12) can be further expressed as: 

«(m)--EE/'(-'^)i°g^ 

x€X yeY '^ > 

= - Y^^pi^^y) log P{x,y) + E ^pi^^y) log P{x) 

xex yeY xex yeY 

= H{X, Y) + Y,P{^) log P{x) = H{X, Y) - H{X) (10.14) 

xex 

Equation (10.14) indicates that, given the data sequenced, the condition entropy of 
data sequence Y can be calculated by the joint entropy between the two data 
sequences, minus the entropy of the data sequence X. 



10.2.2.3 Mutual Information 

The mutual information I(X; Y) measures the amount of information that data 
sequence X contains about data sequence Y, which is defined as (Cover and Thomas 
1991): 

= ^^Pi^.y) log p{x,y) - ^^P{x,y) log[ p{x)p{y)] 

xex yeY xex yeY (10.15) 

= - H{x, Y) - Y,p{x) log p{x) - J2p{y) log p{y) 

xex yeY 

= ~H{X,Y)+H{X)+H{Y) 
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Fig. 10.2 Relationships 
among entropies and mutual 
information 
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Equation (10.15) indicates that the mutual information is the sum of the entropies 
H{X) and H(Y), minus the joint entropy H(X, Y). 

The relationships among the joint entropy, condition entropy, and mutual infor- 
mation are illustrated in a Venn diagram (Cover and Thomas 1991), as shown in 
Fig. 10.2. It is noted that the mutual information I{X; Y) is represented by the 
intersection of the two data sequences. The greater the mutual information is, the 
more similar the two data sequences will be. The condition entropy H{X\Y) or 
H{Y\X) expresses the information that is particular to each corresponding data 
sequence itself, while the joint entropy H{X, Y) includes all information of the 
two data sequences. 

The above-described relationship can be applied to base wavelet selection by 
taking the signal to be analyzed and its corresponding wavelet coefficients as two 
data sequences X and Y, respectively. Since defect-induced transient features are 
represented by the wavelet coefficients, a high value of mutual information between 
the vibration signal and wavelet coefficients can be expected when an appropriate 
wavelet is chosen. It should be noted that, when a vibration signal is obtained, the 
information H(X) is fixed. Similarly, the information H(Y) of the wavelet coeffi- 
cients is fixed once a base wavelet is chosen. On the basis of the relationship 
described earlier, low values of both the joint entropy and condition entropy are 
desired for choosing an appropriate wavelet for characterizing defect-induced 
transient vibrations. 

Following are several criteria for base wavelet selection, based on the informa- 
tion theoretic measures. 

4. Minimum joint entropy criterion: The base wavelet that minimizes the joint 
entropy between the signal and the wavelet coefficients represents the most 
appropriate wavelet for defect-induced transient feature extraction. 

5. Minimum condition entropy criterion: The base wavelet that minimizes the 
condition entropy between the signal and the wavelet coefficients represents 
the most appropriate wavelet for defect-induced transient feature extraction. 
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6. Maximum mutual information criterion: The base wavelet that maximizes the 
mutual information between the signal and the wavelet coefficients represents 
the most appropriate wavelet for defect-induced transient feature extraction. 



10.2.2.4 Relative Entropy 

In contrast to the mutual information, which measures shared information between 
two data sequences, the relative entropy (also known as Kullhack Leihler distance or 
the divergence) is a measure of the distance between probability distributions of data 
sequences X and Y (Cover and Thomas 1991). The relative entropy is defined as 

D(X||y) = ^/,(x)log^ (10.16) 



xex 



'piy) 



wiihpix) log;-|| = ifpix) = 0, and/^W logg = oo if piy) = 0. 

Equation (10.16) states that the relative entropy value is always normegative, 
and it is zero if and only if both probability distributions are equivalent [i.e., p{x) = 
p(y)]. The smaller the relative entropy is, the more similar the distributions of the 
two data sequences will be. For applications in machine condition monitoring and 
health diagnosis, it is expected that an appropriately chosen base wavelet will be 
able to extract features related to defect-induced transient vibrations completely. 
Consequently, a small relative entropy value between the signal (i.e., data sequence 
X) and its corresponding wavelet coefficients (i.e., data sequence Y) is desired. The 
following criterion reflects this consideration: 

7. Minimum relative entropy criterion: The base wavelet that minimizes the rela- 
tive entropy between the signal and the wavelet coefficients represents the most 
appropriate wavelet for defect-induced transient feature extraction. 
Synthesizing the above criteria, an appropriate wavelet should minimize the 

joint entropy, condition entropy, and relative entropy while maximizing the mutual 
information. Such consideration is captured in the following information measure: 

/(X; Y) 
'"^"^'^ = HiXJ)xHiY\X)xDiX\\Y) ^'^'^^^ 

In (10.17), the joint entropy H{X, Y), condition entropy H{Y\X), relative entropy 
D{X\\Y), and mutual information I{X;Y) are calculated using (10.11), (10.14), 
(10.16), and (10.15), respectively. Maximizing the information measure Info(5) 
leads to the following comprehensive criterion: 

8. Maximum information criterion: The base wavelet that has produced the maxi- 
mum information value should be chosen to be the most appropriate wavelet for 
defect-induced transient feature extraction. 
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10.3 Numerical Study on Base Wavelet Selection 

To quantitatively evaluate the base wavelet selection criteria described earlier, a 
Gaussian-modulated sinusoidal test signal is numerically simulated. Mathemati- 
cally such a signal can be expressed as 



x(0 



- ^-Pi'-toY 



sin[27r/(f - r,))] 



(10.18) 



The symbol f! denotes the attenuation factor, and to is the time delay of the signal. 
This type of signal has been widely used for simulating transient vibrations 
involved in mechanical systems (Ho and Randall 2000; Schukin et al. 2004; 
Yang and Ren 2004). Figure 10.3 illustrates the test signal, in which the center 
frequency is 48 Hz, and the sampling frequency is 1,024 Hz. In the following, the 
criteria presented in the above sections are evaluated for choosing best suited base 
wavelet from both real-valued and complex-valued wavelets. 



10.3.1 Evaluation Using Real-Valued Wavelets 

The performance of real-valued wavelets on processing the test signal is evaluated 
first, for which the DWT is performed to decompose the test signal. The decomposi- 
tion level L of the wavelet transform is determined by the sampling frequency /]j, and 
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Fig. 10.3 Test signal: Gaussian-modulated sinusoidal signal 
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frequency component to be identified in the signal, as expressed in the following 
equation: 



f, 



2L+1 



^fcl. 



■ Iq 



(10.19) 



In (10.19),/^ is the sampling frequency, and/c/,a, is related to the characteristic 
frequency component of the signal {e.g.,/^./,^,- = 48 Hz for the test signal). In Table 
10.1, the respective frequency ranges covered by each of the decomposition levels 
under the sampling rate of 1,024 Hz are shown. Since the center frequency (48 Hz) 
of the test signal falls within the frequency range of 32-64 Hz, which is covered by 
the decomposition level 4 (corresponding to scale s = 2 = 16), this level is chosen 
for the evaluation of each wavelet. 

Thirty candidate base wavelets were preselected from seven wavelet families. 
The energy extracted from the Gaussian-modulated sinusoidal signal by these 
wavelets is listed in Table 10.2. It is shown that the Meyer wavelet has extracted 
the highest amount of energy, thus is considered the most appropriate base wavelet 
for analyzing the Gaussian-modulated sinusoidal signal with the given parameters. 
It is also found that the amount of energy that is extracted from the signal increases 
with increasing order of the base wavelet, for each wavelet family. This is because 
base wavelets of higher order within a wavelet family possess higher degree of 
regularity. As a result, they are better suited for extracting energy from the 
Gaussian-modulated sinusoidal test signal than their lower-ordered counterparts 
in the same wavelet family. 

The Shannon entropy of the extracted Gaussian-modulated sinusoidal signal is 
then calculated, as listed in Table 10.3. On the basis of the minimum Shannon 
entropy, the Symlet 3 wavelet is considered as the most appropriate base wavelet. 

Table 10.1 Frequency range for each decomposition level under a 1,024 Hz sampling rate 
Decomposition level (L) Frequency range (Hz) Decomposition level (L) Frequency range (Hz) 



256-512 

128-256 

64-128 



32-64 

16-32 

8-16 



Table 10.2 Energy extracted from the test signal: real-valued wavelets 

Base wavelet Energy (J) Base wavelet Energy (J) Base wavelet Energy (J) 



Haar 


33.855 


Coif4 


60.662 


Bior2.6 


53.645 


Db2 


45.546 


Coif5 


61.856 


Bior4.4 


52.310 


Db4 


54.433 


Sym2 


45.546 


Bior5.5 


54.614 


Db6 


58.167 


Sym3 


51.143 


Bior6.8 


58.415 


Db8 


60.207 


Sym4 


54.433 


rBiol.3 


45.326 


DblO 


61.471 


Sym6 


58.167 


rBio2.4 


55.138 


DB20 


63.687 


Sym8 


60.217 


rBio2.6 


55.546 


Coifl 


46.065 


Meyr 


64.146 


rBio4.4 


59.235 


Coif2 


55.038 


Biorl.3 


53.481 


rBio5.5 


61.123 


Coif3 


58.692 


Bior2.4 


49.198 


rBio6.8 


60.464 
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This conclusion is not consistent with the Meyer wavelet selected by the maximum 
energy criterion. To resolve such conflict, the energy-to-Shannon entropy ratio is 
calculated and the results are listed in Table 10.4. From the maximum energy-to- 
Shannon entropy ratio criterion, the Meyer wavelet possesses the highest values, 
thus is considered the most appropriate wavelet to analyze the Gaussian-modulated 
sinusoidal signal. 

Various criteria based on information theoretic measures have also been studied 
to evaluate the performance of each of the real-valued candidate wavelets, as listed 
in Table 10.5-10.8. It is noted that all of the four measures (i.e., joint entropy, 
condition entropy, mutual information, and relative entropy) point to the Meyer 
wavelet as the most suited wavelet when analyzing the Gaussian-modulated sinu- 
soidal signal. This is because it maximizes the maximum mutual information, while 
minimizing the joint entropy, condition entropy, and relative entropy. The compre- 
hensive criterion "maximum information" that integrates the effect of these four 
measures, as illustrated in (10. 17), has also shown that the Meyer wavelet is the most 
appropriate wavelet. This is verified in Table 10.9, in which a maximum information 
value is obtained when the Meyer wavelet is chosen as the base wavelet. 



Table 10.3 


Shannon entropy 


of the extracted 


signal: real-valued wavelets 




Base 


Shannon 


Base 


Shannon 


Base 


Shannon 


wavelet 


entropy 


wavelet 


entropy 


wavelet 


entropy 


Haar 


3.667 


Coif4 


2.945 


Bior2.6 


5.959 


Db2 


3.137 


CoifS 


2.985 


Bior4.4 


5.673 


Db4 


3.475 


Sym2 


3.137 


Bior5.5 


6.042 


Db6 


3.171 


Sym3 


2.800 


Bior6.8 


4.069 


Db8 


3.491 


Sym4 


3.011 


rBiol.3 


4.579 


DblO 


3.121 


Sym6 


3.598 


rBio2.4 


4.665 


DB20 


3.653 


Sym8 


3.613 


rBio2.6 


4.949 


Coifl 


2.856 


Meyr 


2.959 


rBio4.4 


4.664 


CoiG 


3.609 


Biorl.3 


6.197 


rBio5.5 


5.034 


Coif3 


3.617 


Bior2.4 


6.214 


rBio6.8 


4.069 



Table 10.4 Energy-to-Shannon entropy ratio of the extracted signal: real-valued wavelets 





Energy- 


to-Sharmon 


Base 


Energy- 


to-Shannon 


Base 


Energy-to-Shannon 


Base wavelet 


entropy 


ratio 


wavelet 


entropy 


ratio 


wavelet 


entropy ratio 


Haar 


9.229 




Coif4 


20.594 




Bior2.6 


9.002 


Db2 


14.512 




Coif5 


20.719 




Bior4.4 


9.220 


Db4 


15.662 




Sym2 


14.512 




Bior5.5 


9.047 


Db6 


18.341 




Sym3 


18.265 




Bior6.8 


14.356 


Db8 


17.246 




Sym4 


18.079 




rBiol.3 


9.896 


DblO 


19.693 




Sym6 


16.162 




rBio2.4 


11.817 


DB20 


17.428 




Sym8 


16.662 




rBio2.6 


11.221 


Coifl 


16.124 




Meyr 


21.678 




rBio4.4 


12.699 


Coif2 


15.244 




Biorl.3 


8.629 




rBio5.5 


12.141 


Coif3 


16.224 




Bior2.4 


7.915 




rBio6.8 


14.858 
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Table 10.5 Joint entropy of the extracted signal: real-valued wavelets 



Base wavelet 


Joint entropy 


Base wavelet 


Joint entropy 


Base wavelet 


Joint entropy 


Haar 


4.086 


Coif4 


3.261 


Bior2.6 


3.414 


Db2 


3.409 


Coif5 


3.246 


Bior4.4 


3.338 


Db4 


3.394 


Sym2 


3.398 


Bior5.5 


3.415 


Db6 


3.495 


Sym3 


3.291 


Bior6.8 


3.379 


Db8 


3.358 


Sym4 


3.250 


rBiol.3 


3.603 


DblO 


3.256 


Sym6 


3.441 


rBio2.4 


3.329 


DB20 


3.086 


Sym8 


3.336 


rBio2.6 


3.619 


Coifl 


3.082 


Meyr 


2.957 


rBio4.4 


3.341 


Coif2 


3.614 


Biorl.3 


3.682 


rBio5.5 


3.348 


Coif3 


3.366 


Bior2.4 


3.313 


rBio6.8 


3.453 


Table 10.6 Condition entropy 


of the extracted 


signal: real-valued wavelets 




Base 


Condition 


Base 


Condition 


Base 


Condition 


wavelet 


entropy 


wavelet 


entropy 


wavelet 


entropy 


Haar 


1.539 


Coif4 


0.715 


Bior2.6 


0.792 


Db2 


0.851 


Coif5 


0.699 


Bior4.4 


0.868 


Db4 


0.847 


Sym2 


0.851 


Bior5.5 


0.833 


Db6 


0.948 


Sym3 


0.745 


Bior6.8 


1.057 


Db8 


0.812 


Sym4 


0.704 


rBiol.3 


0.783 


DblO 


0.710 


Sym6 


0.895 


rBio2.4 


1.073 


DB20 


0.539 


Sym8 


0.790 


rBio2.6 


0.795 


Coifl 


0.536 


Meyr 


0.411 


rBio4.4 


0.802 


Coif2 


1.067 


Biorl.3 


1.136 


rBio5.5 


0.907 


Coif3 


0.819 


Bior2.4 


0.767 


rBio6.8 


0.792 



Table 10.7 Mutual information of the extracted signal: real-valued wavelets 





Mutual 


Base 


Mutual 


Base 


Mutual 


Base wavelet 


information 


wavelet 


information 


wavelet 


information 


Haar 


1.243 


Coif4 


1.261 


Bior2.6 


1.045 


Db2 


0.69 


Coif5 


1.317 


Bior4.4 


1.101 


Db4 


1.074 


Sym2 


0.869 


Bior5.5 


1.165 


Db6 


1.151 


Sym3 


0.990 


Bior6.8 


1.559 


Db8 


1.174 


Sym4 


1.085 


rBiol.3 


0.969 


DblO 


1.291 


Sym6 


1.110 


rBio2.4 


0.873 


DB20 


1.502 


Sym8 


1.241 


rBio2.6 


1.011 


Coifl 


0.760 


Meyr 


1.721 


rBio4.4 


0.913 


Coif2 


1.078 


Biorl.3 


0.511 


rBio5.5 


0.979 


Coif3 


1.148 


Bior2.4 


0.997 


rBio6.8 


1.435 



10.3.2 Evaluation Using Complex-Valued Wavelets 



The criteria for ciioosing an appropriate complex-valued wavelet can be evaluated 
by applying the continuous wavelet transform to the test signal. The scale whose 
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Table 10.8 Relative entropy of the extracted signal: real-valued wavelets 





Relative 




Relative 




Relative 


Base wavelet 


entropy 


Base wavelet 


entropy 


Base wavelet 


entropy 


Haar 


0.4851 


Coif4 


0.163 


Bior2.6 


1.155 


Db2 


1.09 


Coif5 


0.111 


Bior4.4 


0.564 


Db4 


0.506 


Sym2 


1.091 


Bior5.5 


0.423 


Db6 


0.290 


Sym3 


0.764 


Bior6.8 


0.371 


Db8 


0.194 


Sym4 


0.507 


rBiol.3 


0.240 


DblO 


0.125 


Sym6 


0.281 


rBio2.4 


0.182 


DB20 


0.022 


Sym8 


0.194 


rBio2.6 


1.146 


Coifl 


1.149 


Meyr 


0.002 


rBio4.4 


0.985 


Coif2 


0.435 


Biorl.3 


1.155 


rBio5.5 


0.515 


Coif3 


0.266 


Bior2.4 


0.564 


rBio6.8 


0.817 



Table 10.9 Information value of the extracted signal: real-valued wavelets 

Base Information Base Information Base Information 

wavelet value wavelet value wavelet value 



Haar 


0.296 


Coif4 


3.226 


Bior2.6 


0.773 


Db2 


0.232 


Coif5 


5.155 


Bior4.4 


1.054 


Db4 


0.679 


Sym2 


0.232 


Bior5.5 


1.550 


Db6 


1.139 


Sym3 


0.471 


Bior6.8 


2.915 


Db8 


2.146 


Sym4 


0.858 


rBioI.3 


0.187 


DblO 


4.348 


Sym6 


1.221 


rBio2.4 


0.292 


DB20 


40 


Sym8 


2.347 


rBio2.6 


0.461 


Coifl 


0.339 


Meyr 


1,000 


rBio4.4 


0.371 


Coif2 


0.594 


Biorl.3 


0.082 


rBio5.5 


0.537 


Coif3 


1.495 


Bior2.4 


0.633 


rBio6.8 


1.534 



Table 10.10 Energy 
extracted from the test signal: 
complex-valued wavelets 



Base wavelet 



Energy (J) 



Morlet wavelet 
Gaussian wavelet 
B-Spline wavelet 
Shannon wavelet 
Harmonic wavelet 



96.243 
58.942 
57.257 
14.789 
15.835 



corresponding center frequency is equal to that of the frequency component of 
interest (e.g., 48 Hz in the test signal) is chosen for the wavelet transform. In 
general, the scale of the wavelet, s, and the corresponding center frequency of the 
scaled wavelet,/s_c, are related by (Abry 1997): 



fqfh.c 
" fs_c 



(10.20) 
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where /q is the samphng rate, /b_{c is the center frequency of the base wavelet, and 
/s c is the center frequency of the scaled wavelet. 

Table 10.10 lists the energy values extracted from the test signal, whereas Table 
1 0. 1 1 lists the corresponding Shannon entropy of the extracted signal. It is seen that the 
maximum energy criterion selected the complex Moiiet wavelet as the most suited 
wavelet among the five complex-valued wavelets, while the minimum Shannon 
entropy criterion identified the Complex Gaussian wavelet. Such a conflict is resolved 
by the integrated energy-to-Shannon entropy ratio criterion. In Table 10. 12, it is shown 
that the Complex Morlet wavelet led to the maximum energy-to-Shannon entropy ratio. 
As a result, the Complex Morlet wavelet is considered the most suited base wavelet. 

The information theoretic measures can also be used to evaluate performance for 
each of the candidate wavelets, as listed in Table 10.13-10.16. It is noted that the 
Complex Morlet wavelet is again identified as the most suited wavelet by using the 
following three criteria: minimum joint entropy, the minimum condition entropy. 



Table 10.11 Shannon 
entropy of the extracted 
signal: complex- valued 
wavelets 



Table 10.12 Energy-to- 
Shannon entropy ratio of the 
extracted signal: complex- 
valued wavelets 



Table 10.13 Joint entropy of 
the extracted signal: complex- 
valued wavelets 



Base wavelet 



Morlet wavelet 
Gaussian wavelet 
B-Spline wavelet 
Shannon wavelet 
Harmonic wavelet 



Base wavelet 



Morlet wavelet 
Gaussian wavelet 
B-Spline wavelet 
Shannon wavelet 
Harmonic wavelet 



Base wavelet 



Morlet wavelet 
Gaussian wavelet 
B-Spline wavelet 
Sharmon wavelet 
Harmonic wavelet 



Shannon entropy 



7.322 
7.290 
7.365 
7.690 

7.453 



Energy-to-Shannon 
entropy ratio 



13.143 

8.085 

7.772 

1.923 

2.125 



Joint entropy 



3.121 
3.280 
3.248 
4.167 
3.639 



Table 10.14 Condition 
entropy of the extracted 
signal: complex-valued 
wavelets 



Base wavelet 



Morlet wavelet 
Gaussian wavelet 
B-Spline wavelet 
Shannon wavelet 
Harmonic wavelet 



Condition entropy 



0.575 
0.734 
0.702 
1.614 
1.093 
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Table 10.15 Mutual 
information of the extracted 
signal: complex-valued 
wavelets 



Base wavelet 



Mutual information 



Morlet wavelet 
Gaussian wavelet 
B-Spline wavelet 
Shannon wavelet 
Harmonic wavelet 



1.665 
1.808 
1.705 
1.296 
1.551 



Table 10.16 Relative 
entropy of the extracted 
signal: complex- valued 
wavelets 



Base wavelet 



Relative entropy 



Morlet wavelet 
Gaussian wavelet 
B-Spline wavelet 
Shannon wavelet 
Harmonic wavelet 



0.009 
0.060 
0.011 
0.214 
0.027 



Table 10.17 Information 
value of the extracted signal: 
complex-valued wavelets 



Base wavelet 



Information value 



Morlet wavelet 
Gaussian wavelet 
B-Spline wavelet 
Shannon wavelet 
Harmonic wavelet 



111.111 

12.346 

66.667 

0.864 

13.889 



and the minimum relative entropy. However, when the maximum mutual informa- 
tion criterion is appUed, the Complex Gaussian wavelet is identified as the winner. 
We apply once again the comprehensive criterion "maximum information," and 
the conflict is successfully resolved. As shown in Table 10.17, a maximum 
information value is obtained when the complex Morlet wavelet is chosen as 
the base wavelet. 

The reason why the Complex Morlet wavelet is the most suited base wavelet for 
analyzing the Gaussian-modulated sinusoidal signal can be explained from a 
physical point of view by comparing their corresponding analytical expressions, 
as shown in (10.18) and (10.21) below: 



•AmCO 



1 



fWh 



.e^'U" 



(10.21) 



Tuning the bandwidth /j, and center frequency /^ of the Complex Morlet wavelet, the 
scaled Complex Morlet wavelet can be expressed as: 



m = 



_gy27c48r^-120f 



(10.22) 



Equation (10.22) illustrates a perfect match of the scaled Complex Morlet wavelet 
to the Gaussian-modulated sinusoidal signal given in (10.18). As a result, its 
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wavelet coefficients best represent the test signal, which is why this wavelet has 
extracted the maximum amount of energy from the test signal. 

In summary, using the Gaussian-modulated sinusoidal test signal, we have demon- 
strated how to systematically choose a base wavelet from a number of candidates by 
using the various quantitative measures. The two comprehensive criteria, i.e., engery- 
to-Shannon entropy measure and maximum information measure, have shown to be 
effective in choosing the most suited base wavelet for decomposing vibration signals 
for machine condition monitoring and health diagnosis. 



10.4 Base Wavelet Selection for Bearing Vibration Signal 

We now demonstrate how the two comprehensive wavelet selection criteria, 
maximum energy-to-Shannon entropy ratio and maximum information measure, 
have been applied to selecting base wavelet for bearing vibration signals. Figure 
10.4a illustrates the waveform of a vibration signal measured from a ball bearing that 
contains a localized defect on its outer raceway. The sampling rate is 10,000 Hz. The 
spectrum in Fig. 10.4b indicates a major peak frequency component at 1,840 Hz. 
This component is used as the reference base for determining the decomposition 
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Fig. 10.4 Bearing vibration signal and its corresponding spectrum, (a) Tirne domain waveform 
and (b) frequency domain spectrum 
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level (for DWT) as well as for the scale selection (when performing CWT). The 
two criteria have been applied to evaluating real-valued and complex-valued 
wavelets, respectively. 

The real-valued wavelets are evaluated first. The decomposition level of the 
DWT is chosen to be 2, which corresponds to scale 4(s — l}). This scale covers the 
frequency range from 1,250 to 2,500 Hz, within which the major peak frequency 
component is located (at 1,840 Hz). After calculation of the energy and Shannon 
entropy values of the bearing vibration signal by each of the candidate wavelets, the 
energy-to-Shannon entropy ratio is calculated for each wavelet, and the results are 
listed in Table 10.18. On the basis of the maximum energy-to-Shannon entropy ratio 
criterion, the reverse Biorthogonal wavelet 5.5 (denoted as rBio5.5) was considered 
as the most appropriate wavelet for analyzing the bearing vibration signal. 

Various similarity measures, including joint entropy, condition entropy, relative 
entropy, and mutual information, have also been calculated to evaluate the candi- 
date base wavelets. By integrating these similarity measures into the maximum 



Table 10.18 Energy-to-Shannon entropy ratio of the extracted bearing vibration signal: real- 
valued wavelets 



Base 


Energy-to-Shannon 


Base 


Energy-to-Shannon 


Base 


Energy-to-Sharmon 


wavelet 


entropy ratio 


wavelet 


entropy ratio 


wavelet 


entropy ratio 


Haar 


56.279 


Coif4 


75.980 


Bior2.6 


69.647 


Db2 


80.793 


Coif5 


76.473 


Bior4.4 


69.864 


Db4 


104.750 


Sym2 


80.120 


Bior5.5 


91.454 


Db6 


71.343 


Sym3 


73.969 


Bior6.8 


77.721 


Db8 


74.153 


Syra4 


59.229 


rBiol.3 


43.843 


DblO 


93.488 


Sym6 


77.946 


rBio2.4 


69.435 


DB20 


85.949 


Sym8 


68.515 


rBio2.6 


70.795 


Coifl 


66.550 


Meyr 


77.757 


rBio4.4 


76.204 


Coif2 


72.738 


Biorl.3 


39.720 


rBio5.5 


109.920 


Coif3 


75.050 


Bior2.4 


63.477 


rBio6.8 


78.777 



Table 10.19 Information value of the extracted bearing vibration signal: real-valued wavelets 



Base 


Information 


Base 


Information 


Base 


Information 


wavelet 


value 


wavelet 


value 


wavelet 


value 


Haar 


0.106 


Coif4 


0.143 


Bior2.6 


0.180 


Db2 


0.223 


Coif5 


0.142 


Bior4.4 


0.181 


Db4 


0.143 


Sym2 


0.223 


Bior5.5 


0.219 


Db6 


0.127 


Sym3 


0.180 


Bior6.8 


0.167 


Db8 


0.116 


Sym4 


0.108 


rBiol.3 


0.105 


DblO 


0.136 


Sym6 


0.171 


rBio2.4 


0.162 


DB20 


0.129 


Sym8 


0.122 


rBio2.6 


0.169 


Coifl 


0.173 


Meyr 


0.108 


rBio4.4 


0.140 


Coif2 


0.169 


Biorl.3 


0.101 


rBio5.5 


0.242 


Coif3 


0.124 


Bior2.4 


0.179 


rBio6.8 


0.158 
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Table 10.20 Energy-to- 
Shannon entropy ratio of the 
extracted bearing vibration 
signal: complex- valued 
wavelets 



Base wavelet 



Energy-to-Sharmon 
entropy ratio 



Morlet wavelet 
Gaussian wavelet 
B-Spline wavelet 
Shannon wavelet 
Harmonic wavelet 



60.765 
56.044 
35.051 
12.476 
14.504 



Table 10.21 Information 
value of the extracted bearing 
vibration signal: complex- 
valued wavelets 



Base wavelet 



Information value 



Morlet wavelet 
Gaussian wavelet 
B-Spline wavelet 
Shannon wavelet 
Harmonic wavelet 



0.189 
0.068 
0.105 
0.017 
0.091 



information criterion, it is found that the reverse Biorthogonal wavelet 5.5 is the 
most suited wavelet for analyzing the bearing vibration signal. Details of the results 
are listed in Table 10.19. 

Continuous wavelet transform is also applied to analyze the bearing signals, 
using the five commonly seen complex-valued wavelets. The energy and Shannon 
entropy values of the bearing vibration signal extracted by each wavelet are first 
calculated, and their corresponding energy-to-Shannon entropy ratios are then 
determined. As listed in Table 10.20, the Complex Morlet wavelet indicates the 
maximum energy-to-Shannon entropy ratio, thus is considered the most appropriate 
base wavelet for bearing signal analysis. 

In addition, the value of the information measure of the bearing vibration signal 
extracted by each candidate wavelet is calculated, and the result is shown in Table 10.21. 
Based on the maximum information criterion, the Complex Morlet wavelet is again 
identified as the most suited wavelet, since it demonstrates the highest information value 
compared to other four candidate wavelets. 



10.5 Summary 



Using a number of quantitative measures, we presented a systematic approach in 
selecting a base wavelet that is best suited for analyzing nonstationary signals, 
typically seen in manufacturing. These measures are examined from two difference 
aspects: (1) their corresponding wavelet coefficients (i.e., the energy and Shannon 
entropy measures) and (2) the relationship between the signal being analyzed and the 
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coefficients of the base wavelet used for the analysis (i.e., joint entropy, condition 
entropy, mutual information, and relative entropy). Based on these measures, two 
comprehensive base wavelet selection criteria (i.e., the maximum energy-to-Shannon 
entropy ratio and the maximum information measure) are identified as the quantita- 
tive measure for determining the best suited wavelet. Both numerical study and 
experimental data analysis have shown that these two criteria provide quantitative 
guidance to base wavelet selection for effective signal analysis. 
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Chapter 11 

Designing Your Own Wavelet 



To achieve effective signal signature extraction, Chap. 10 introduced several 
quantitative measures for selecting appropriate base wavelets from a pool of 
available wavelet families, such as Daubechies, Myer, and Morlet wavelets. This 
chapter introduces a complimentary technique focusing on wavelet customization. 
The goal is to design a wavelet that is specifically adapted to the signal of interest. 
Because such a customized wavelet would have a higher degree of matching with 
the signal than other wavelets, the effectiveness of signature extraction will 
improve. 



11.1 Overview of Wavelet Design 

Researchers have studied various techniques for designing base wavelets. In the late 
1980s to early 1990s, Daubechies' work has led to the publication of orthonormal 
(Daubechies 1988) and biorthonormal (Cohen et al. 1992) base wavelets with 
compact support. These wavelets are independent of the signal to be analyzed. 
Tewfik et al. (1992) have developed cost functions for finding the optimal ortho- 
normal wavelet basis to represent a specified signal within a finite number of scales. 
Their work has been extended by assuming band limited signals and finding the 
optimal M-band wavelet basis within a finite number of scales (Gopinath et al. 
1994), for representing a desired signal. During the same period, Aldroubi and 
Unser (1993) proposed a method to match a wavelet basis to a desired signal by 
either projecting the desired signal onto an existing wavelet basis, or transforming 
the wavelet basis under certain conditions such that the error norm between the 
desired signal and the new wavelet basis is minimum. Recently, Chapa and Rao 
(2000) have developed two sets of equations for designing a wavelet directly from a 
signal of interest. The first set of equations derives expressions for continuously 
matched wavelet spectrum amplitudes, whereas the second set provides a direct 
discrete algorithm for calculating approximations to the optimal complex wavelet 
spectrum. By formulating wavelet design as a constrained optimization problem 
and then solving it by converting the optimization problem into an iterative line- 
search problem through a first-order parameterization of the perfect reconstruction 
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constraint, a signal-adapted, biorthogonal filter banks of finite length was 
constructed by Lu and Antoniou (2001). Later, Shark and Yu (2003) proposed a 
genetic algorithm-based design method to construct orthonormal wavelet filter 
banks with an optimal shift-invariant property. On the basis of a generalized 
Mexican-hat function, the authors also designed a new class of continuous wavelets 
for arbitrary transient signals (Shark and Yu 2006), where signal matching is 
achieved by minimizing the spectral difference between the reference signal and 
the generalized Mexican-hat wavelet. Gupta et al. (2005a) have proposed to 
construct wavelets that are matched to a given signal in the statistical sense. 
The main idea is to first estimate a high-pass wavelet filter from the statistics of 
the signal, and then obtain a FIR/IIR biorthogonal perfect reconstruction filter bank. 
This leads to the construction of a statistically matched wavelet. The authors have 
also designed both biorthogonal and semiorthogonal wavelet from a signal by 
maximizing projection of the signal onto successive scaling subspaces while mini- 
mizing energy of the signal in the wavelet subspace (Gupta et al. 2005b). Using the 
same idea, Guido et al. (2006) designed a spikelet wavelet that has shown improved 
performance on pattern recognition of signals corresponding to neural action 
potentials of HI, a motion sensitive neuron in the fly's visual system. These prior 
efforts motivate our study of application-specific base wavelets for improved 
signature extraction in signals related to manufacturing. 



11.2 Construction of an Impulse Wavelet 

Considering that base wavelets available in the literature (e.g., provided by 
MATLAB) are developed primarily from a mathematical point of view without 
reference to a specific physical system - although in real-world applications, signals 
to be analyzed are generally produced by physical systems - it would be interesting, 
from an intellectual pursuit point of view, to study how to construct a customized 
base wavelet from the physical phenomena being analyzed. Naturally, such con- 
struction process will have to satisfy the mathematical requirement for designing a 
base wavelet. With this in mind, we introduce an impulse wavelet designed for 
analyzing vibration signals measured from rolling bearings, which are widely used 
in manufacturing machines. 

Generally, a base wavelet must satisfy the conditions as described in Chaps. 3 
and 4 to ensure that a signal's wavelet transformation does not result in loss of 
information so that the signal can be properly reconstructed from the corresponding 
wavelet coefficients. Mathematically, such a reconstruction exists if a scaling 
function (pit), which satisfies the following dilation equation (Burrus et al. 1998; 
Cui et al. 1994), can be found as: 
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In (11.1), h„ is a set of scaling coefficients applied to (f>(t - n). Equation (11.1) 
indicates that the dilated version of (/)(?) can be written as the sum of translated 
versions that are scaled by the coefficients h„. Furthermore, it indicates that a 
scaling function at one scale can be constructed from a number of scaling functions 
at a previous scale. In general, the construction of a base wavelet starts from the 
scaling function that satisfies (11.1). Such scaling function is then used to derive the 
base wavelet. 

Assuming an impulsive input is applied to a rolling bearing, a corresponding 
output signal can be defined by the convolution integral in the continuous form as 
(Inman 1996; Lutes and Sarkani 1997): 



x{t) = 



R{T)h{t~x)dx (11.2) 



where R{x) denotes the impulsive input, and x{t) denotes the output signal. 

In the discrete form, the impulsive input 7?(t) is sampled at /?(«), and the output 
signal can be obtained as: 

x{t) =^R{n)h{t ~ n) (11.3) 



In (11.2) and (11.3), the symbol /?(•) represents the impulse response of the rolling 
bearing. Considering the discrete form of convolution expression, the similarity 
between (11.3) and (11.1) becomes apparent: theoutputj:(0in(11.3) can be viewed 
as the sum of translated versions of the impulse response /!(•) that are scaled by the 
input R(n). If the impulse response satisfies (11.1), then it can be used to form a 
scaling function that contains relevant information on the underlying dynamics of 
the bearing being monitored. Subsequently, the scaling function can be used to 
construct a base wavelet for analyzing vibration signals measured from the bearing. 
Because of the nature of such a derivation, it is expected that the base wavelet 
presents a more direct and meaningful decomposition of the bearing signal than the 
standard wavelets commonly found in the literature. 

To construct the base wavelet, several impulse responses of a ball bearing have 
been taken through hammer strikes, as shown in Fig. 11.1. The corresponding 
spectra are shown in Fig. 11.2. It is seen that the frequency components below 
1,500 Hz are consistent in terms of their magnitudes, but those above 1,500 Hz have 
varied. The magnitudes of these high frequency components are smaller than those 
of the lower frequency components. 

Because of their relatively small magnitude and random behavior, the high 
frequency components were treated as noise and removed from the signal with a 
low pass filter. The cutoff frequency for the filter was chosen to be 1 ,500 Hz, as the 
spectral components of each impulse were stable below this frequency. As seen in 
Fig. 11.3 where the original and filtered signals are shown, the filter is effective in 
removing noise from the impulse response and retaining the frequency constant of 
the original signal. 
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Fig. 11.1 Waveform of three consecutive impulse responses from a ball bearing 
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Fig. 11.2 Spectra of three consecutive impulse responses from a ball bearing 



In order for the filtered impulse response shown in Fig. 11.3 to be used as a 
scaling function, it must satisfy the dilation equation, for which the length of the 
support interval of the signal must be at least one. A function with a support interval 
of less than one would have a gap between h„(f){t-n) and h„+i(f)(t-n-\), within which 
nothing contributes to the sum on the right hand side of ( 1 1 . 1 ). Consequently, such a 
function would not satisfy the dilation equation. To address this issue, the impulse 
response is first dilated such that its support is greater than one. After dilation, the 
coefficients h„ are determined from a recursive relationship that is derived from the 
dilation equation. 

As an example of the procedure of satisfying the dilation equation, a standard 
Daubechies scaling function <^{t) (Fig. 1 1.4) is illustrated below. It should be noted 
first that an explicit expression for the Daubechies scaling function does not exist 
(Daubechies 1992). 
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After the sequence of coefficients h„ is obtained, they are used to form an FIR 
filter denoted by // . The FIR fihers H and G are quadrature mirror filters if, for a 
signal x(ty. 



\H*xm + \\G*xit) 



m 



(11.4) 



Together, H and G form a pair of reconstruction filters for the wavelet decompo- 
sition of a signal. This process, called deconstruction, is implemented via the 
adjoints of// and G , which are denoted by H and G, respectively (Kaiser 1994). 
For the Daubechies scaling function 0(0 shown in Fig. 1 1 .4a, the filter coefficients 
are as follows: K = (0.2304, 0.7148, 0.6309, -0.0280, -0.1870, 0.0308, 0.0329, 
-0.0106), « = 0, 1,..., 7 (Misiti et al. 1997). In Fig. 11.4b, the scaled and 
translated version of (/)(0 (i.e., /?„(/>(? - n) for « = 0, 1, 2, . . . , 7) is shown. Since 
(/»(r) is a valid scaling function and h„ are valid filter coefficients, the dilation 
equation is satisfied, as shown in Fig. 1 1.4c. 

The above procedure is repeated for the impulse response as shown below in 
Fig. 11 .5. It should be noted that the impulse response (f)(t) here is a function of the 
bearing dynamics, not an exact solution to the dilation equation. However, a set of 
filter coefficients h„ can be determined such that the impulse response approxi- 
mately satisfies the dilation equation. 

The filter coefficients can be calculated such that the dilation equation is satisfied 
exactly at integer values of t. The solution is recursive: each /i„, for n > 0, can be 
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Fig. 11.5 The impulse scaling function obtained from the ball bearing 
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explicitly determined as a function of (/)(f) and ho, hi, h2, ■ ■ ■ , h„_i. The first 
coefficient, ho, is simply a function of (/)(?)■ These solutions are obtained by eval- 
uating the dilation equation at integer values of t. For t = 1, (11.1) gives 
(/»(1/2)/a/2 = ho4>{l). The terms /7i(/)(0), /i2(/>(— 1), etc., do not appear because 
(pit) = for r < 0. (Recall that (j) is compactly supported.) Thus, /iq is determined by: 

"'= </,(l) ^^^-^^ 

Similarly, for t — 2, (11.1) gives: 

-^</.(l) = /7o(/)(2) + /2i0(l) (11.6) 



Forf = 3: 



■^^(^ ^ hoct){3) + hic^{2) + h2ct){l) (11.7) 



Forr = jV+ 1: 



Tl^i^^) " ^'^^^ ^ ^^ ^ ''"^^^^ "^ ^"^^^ -!) + ■■■ + hN^{l) (11.8) 

Equations (1 1.5)-(1 1.8) determine a recursive definition for each filter coefficient h„. 
With the first coefficient ho given by (1 1.5), the remaining coefficients are given by: 

, 2-'/^^((^+i)/2)-e:::m("+i-^) , ^, 

h„ = ,.,/~° , for n > 1 (1 1.9) 

Since the scaling function (pit) is given by the impulse response of the bearing, each 
of the filter coefficients h„ can be readily determined from (11.5) and (11.9). 
Furthermore, note that the dilation equation can be written as: 

-^ct>Q=hom+i2>'j^('~j^ (11-10) 

Since h„ is given by (11.9), the dilation equation can be rewritten as: 

1 /?\ e'[2-i/2</,(0-+i)/2)-e;:1M(; + 1-^)1 </'(?-;) 

■' >- ' L '-" 1 +hom (11.11) 



V2 VJ 0(1) 
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Collecting terms yields the following fomi for the dilation equation: 



^H: 



j:l[2-'/^{{j+i)/2mt-j)] 



E;l.E;:lM(i+i-^)'/'(^-./) 



/!„</.(?) (11.12) 



Note that only the second term on the right hand side of (11.12) contains filter 
coefficients /?„, which are determined by (11.5) and (11.9). Equation (11.12) serves 
to illustrate the interesting relationship that the dilation equation imposes between 
h„ and (f){t). Particularly, the expression given by (11.12) shows that the dilated 
version of the scaling function is related not only to <p{t — n) scaled by the filter 
coefficient h,„ but also to <p{t — n), scaled by 4>{t), which is evaluated at integer 
values of t. The recursive relationship given by (1 1.9) gives h„ such that the dilation 
equation is satisfied at x = {0, 1, 2, . . . ). At other points, the sum on the right hand 
side of (11.1) might differ from the left hand side. In practical treatment of an 
impulse scaling function such as shown in Fig. 11.5a, (11.5) and (11.9) are first used 
to obtain an initial set of filter coefficients. These coefficients are then optimized by 
minimizing the following error function: 



\ 




^/7„(/)(?-«) dt (11.13) 



The error Ej^^s is a scalar valued function of the vector of filter coefficients h„ , and the 
optimization is accomplished by finding the vector which minimizes £rms- Since 
Zsi-ms is a measure of how well the dilation equation is satisfied, the vector h„ 
minimizing E^^^^ is the best set of filter coefficients that can be obtained from (f){t). 
Using this technique, the filter coefficients are determined to be: h^ = { —0.0529, 
0.4897, 0.9601, 0.4848, 0.1467, 0.2653, 0.1723, 0.1295, 0.1208, 0.0495, -0.0182, 
—0.0255, 0.0131 j, for « = 0, 1, . . . , 12. The translated and scaled versions of (pit) 
corresponding to these h„ (i.e., h„ (f){t — n)) are plotted in Fig. 12.5b. As indicated by 
Fig. 12.5c, the impulse response is an approximate solution to the dilation equation 
(£^rms = 0.0984). The low pass filter coefficients derived from this scaling function (/)(r) 
can then be used to determine the corresponding wavelet if/it) (Young 1993; Mallat 
1998). The coefficients for the high pass reconstruction filter G are determined from 
(1 1.4). The wavelet is evaluated by upsampling G , convolving it with H , and then 
iteratively repeating this procedure: 

h:^,=^g**h: (11.14) 

where ff is a dyadic up-sampling operator. Thus, after A' iterations, \li{t) = //^+i. 
Figure 11.6 shows the result of four iterations of (11.14), which produced a 
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customized wavelet, based on the impulse response of the rolling bearing structure. 
The set of FIR filters based on i//(r) and synthesis based on (f){t) are given in 
Table 11.1, where the filters have been normalized to have a norm of 1/V2. 
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Fig. 11.6 The wavelet derived from the impulse response 



Table 11.1 Normalized filter coefficients 
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11,3 Impulse Wavelet Application 

As an application example, the impulse wavelet is used to diagnose bearing defect. 
Figure 1 1 .7a shows a vibration signal measured on a SKF 6220 ball bearing with a 
0.25-mm hole on its inner raceway. This signal is sampled at 10 kHz, and the 
rotating speed of the bearing is 600 rpm (i.e., corresponding to 10 Hz rotating 
frequency). Based on geometric dimensions of the bearing and the rotating speed 
(Harris 1991), the defect characteristic frequency on the inner raceway of such 
bearing is /bpfii = 58.6 Hz. As illustrated in Fig. 11.7b, such a defect-related 
frequency component is not seen in its power spectral density (PSD) resulted from 
the Fourier transform. 

Utilizing the wavelet integrated with Fourier transform technique, which is 
described in Chap. 7, the same vibration signal is first analyzed by the wavelet 
transform. The impulse wavelet, developed from the impulse response of the rolling 
bearing as described above, is used as the base wavelet. Fourier transform is then 
performed on the wavelet coefficients obtained from the wavelet transform to 
expose explicitly the related frequency components. Figure 11.8 illustrates the 
resulting wavelet coefficients and their corresponding PSD. It is seen that the 
defect-related frequency component /bpfii at 58.6 Hz is clearly shown in the 
spectrum, thus verifying the existence of a localized itmer raceway defect. 

To demonstrate the signature extraction capability of the designed impulse 
wavelet for bearing defect diagnosis, a comparison study is carried out, where 



a 
op 



1 

0.5 



-0.5 

-1 




0.1 



0.2 



0.3 



0.4 0.5 0.6 

Time (s) 



0.7 



0.8 



0.9 




Q 



30 40 

Frequency (Hz) 

Fig. 11.7 Vibration signal and its PSD from a defective bearing 
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Fig. 11.8 Wavelet-integrated Fourier spectrum using the customized impulse wavelet 



five standard base wavelets from the literature: Daubechies 2 and 4 wavelets, 
Coiflets 1, Symlets 3, and Biorthogonal 2.2 (Daubechies 1992; Lou and Loparo 
2004; Zhang et al. 2005) are used to analyze the vibration signal. The upper parts of 
Figs. 11.9-11.13 are intermediate results (i.e., wavelet coefficients) of the 
integrated wavelet-Fourier transform analysis, and the lower parts of these figures 
are their corresponding PSDs. It is shown that all the five standard base wavelets 
can identify the defect-related frequency component, and the results are shown in 
the lower parts of Figs. 11.9-11.13. 

In the spectra of Figs. 11.9-11.13, there exists a frequency component /bpfo2 at 
56.5 Hz, which has a distinct magnitude. Such a frequency component is identified 
as from the ball rotation of another bearing in the support structure (Yan et al. 
2009). To enable a quantitative performance comparison of the developed impulse 
wavelet and other five standard base wavelets, a signal-to-noise ratio measure is 
introduced, which is the amplitude ratio between the defect frequency /bpph and the 
adjacent frequency /bpfo2- As listed in Table 11.2, the impulse wavelet has shown 
the highest signal-to-noise ratio in detecting the defect-characteristic frequency of 
/bpfii = 58.6 Hz. This result can be attributed to the nature of this customized 
wavelet, which is derived from the actual impulse response of the bearing structure. 
The direct link to the dynamics of the bearing and thus inherent better match to the 
bearing signature than a standard wavelet has made it more effective in exposing 
the constituent features for defect identification. 
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Fig. 11.9 Wavelet-integrated Fourier spectrum results using Daubechies 2 (Db2) wavelet 
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Fig. 11.10 Wavelet-integrated Fourier spectrum results using Daubechies 4 (Db4) wavelet 
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Fig. 11.11 Wavelet-integrated Fourier spectrum results using Coiflets 1 (Coifl) wavelet 
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Fig. 11.12 Wavelet-integrated Fourier spectrum results using Symlets 3 (Sym3) wavelet 
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Fig. 11.13 Wavelet-integrated Fourier spectrum results using Biorthogonal 2.2 (Bior2.2) wavelet 



Table 11.2 Comparison of 
signal-to-noise ratios for 
different base wavelets 



Base wavelet 



/bpfii//bpfo2 



Impulse 

Db2 

Db4 

Coifl 

Sym3 

Bior2.2 



9.55 
1.88 
0.27 
1.41 
0.62 
0.43 



11.4 Summary 



This chapter introduces the procedure to design a wavelet based on the dynamics of 
the physical system being analyzed. Using the impulse response of a rolling bearing 
system, an impulse wavelet has been constructed for defect-induced signature 
extraction. Experimental study has verified the effectiveness of the impulse wavelet 
in identifying bearing localized defect of the bearing in its inner raceway, as 
illustrated in the comparative study involving five standard wavelets from the 
literature. Although the impulse wavelet development is based on a specific type 
of bearing, the analytical procedure described in this chapter should be applicable to 
the analysis of other types of mechanical systems. 
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Chapter 12 
Beyond Wavelets 



In previous chapters, we have introduced the theoretical foundation and practical 
applications related to the wavelet transform. The ability of wavelet transform in 
adaptive time-scale representation and decomposition of a signal into different 
subfrequency band presents an efficient signal analysis method without introducing 
calculation burden (Sweldens 1998). Consequently, it has become a prevailing tool 
for nonstationary signal processing (e.g., transient pattern identification and loca- 
tion). Given, however, the great variety of signals that appear in real- world applica- 
tions, there remains plenty of room for continued advancement in the theory of the 
classical wavelet transform. For example, one of the limitations of the wavelet 
transform is to modify the base wavelet function to better analyze signals of finite 
length or duration, instead of infinite or periodic signals (Sweldens 1997). 
In addition, it has limitations in precisely capturing and defining the geometry of 
image edges. In this chapter, we introduce several new developments in signal 
and image processing that address these limitations and extend beyond the scope of 
the classical wavelet transform method (Jiang et al. 2006, 2008; Li et al. 2008; 
Zhou et al. 2010). 



12.1 Second Generation Wavelet Transform 

Second generation wavelet transform (SGWT), as an advanced mathematical tool 
for time-scale representation of signals, has been developed to overcome deficien- 
cies of the classical wavelet transform. Specifically, the mechanism of constructing 
a base wavelet from the translation and dilation of a fixed function has been 
replaced by the so-called lifting scheme (Sweldens 1996, 1998). The resulting 
wavelet transform has the following properties (Uytterhoeven et al. 1997): 

1 . It is a generic method that is faster to calculate and easier to implement than the 
classical wavelet transform. 

2. It can transform signals with a finite length without extension of the signal to 
infinite duration. 
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3. It can be applied to irregular signal samplings and extended for the 
determination of weighting functions. 

4. Its inverse transform shares the same complexity as the forward transform. 



12.1.1 Theoretical Basis of SGWT 

The architecture of the lifting scheme can be illustrated in Fig. 12.1. The forward 
procedure of the lifting scheme, similar to its counterpart in the classical discrete 
wavelet transform, is to obtain both the approximation and detail of the original 
signal. It mainly incorporates three critical operational steps: (1) splitting, 
(2) prediction, and (3) updating. When starting the lifting scheme process, the 
signal x(i) is first split into two subsets, the odd sample Xo^d and the even sample 
J-even, by mcans of a sample sequence. For example, given a signal x{i), 
where / = 1, 2, 3, . . . , 2n (« is a natural number), it will be split as: 



J^odd = {x{2i - 1)} 

Xeyen = {x{2i)} 



1,2,3, 



(12.1) 



When the splitting procedure of the signal x(i) is completed, the odd and even 
subsamples are obtained and the signal is subsampled by a factor of 2. 

Following the splitting operation is the prediction operation, which predicts the 
odd data sample with the even data sample as: 



-^odd — ' l,^^even) 



(12.2) 



In (12.2), P is the prediction operator that is independent of the signal. 
The difference between the predicted result and the odd sample is considered as 
the detail of the original signal, d, described as: 



" — -^odd -^odd -^odd ^(,-^evenj 



(12.3) 
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Fig. 12.1 Forward transform procedure of lifting scheme 
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Given the Xeven and the detail, the approximation can be calculated with the 
updating operator U as: 



U{d) 



(12.4) 



Similar to the prediction operation, the updating operation is also independent 
of the signal to be analyzed. The functions of prediction and updating operators 
are similar to that of a pair of h{n) and g(n) filters in the classical wavelet transform, 
and they can be derived from the scaling function (f){t) and wavelet function xjjif) 
by iteration algorithm (Claypoole 1999; Claypoole et al. 2003). It should be 
noted that the prediction and updating operators can be optimized using different 
algorithms, such as the Claypoole's optimization algorithm (Claypoole 1999; 
Claypoole et al. 2003). 

Based on the forward procedure described above, the signal is decomposed into 
two parts: approximation and detail. This process can be iterated by taking the 
approximation as the input signal to continue the decomposition. Furthermore, by 
iterated decomposition of the detail and the approximation together, wavelet packet 
transform can be realized with the lifting scheme. 

The decomposition is invertible, and the signal reconstruction procedure is 
illustrated in Fig. 12.2. 

As the forward procedure realizes decomposition of the original signal, 
the reverse procedure realizes signal reconstruction. Similar to the forward 
procedure, the reverse procedure involves both the prediction operator and the 
updating operator. This means that the following relationship exists: 



^even = a ~ U{d) 



(12.5) 



The signal can then be reconstructed by merging Xgven and Xq^jj- 
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Fig. 12.2 Reverse transform procedure of lifting scheme 
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12.1.2 Illustration ofSGWT in Signal Processing 



Several examples are illustrated here on the application of the SGWT algorithm. 
The first example involves a signal that consists of two frequency components, 
sampled at 100 Hz as: 



x{t) = sm{2n ■ lit) + sm{2n ■ 4lt) 



(12.6) 



The signal can be separated by one-level SGWT. Performing SGWT on this 
signal, where the db8 wavelet function is used as the starting point to derive the 
prediction and updating operators and to get the approximation part a 1 and detailed 
part dl. The result is shown in Fig. 12.3. The accuracy of the decomposition result 
is evaluated through calculation of the error. Specifically, the absolute values of 
subtracting approximation coefficients in al and detailed coefficients in dl at each 
sampling point from the original signal are summed up, as illustrated below: 



N 

error = > 



= f^[x(/)-(«l(/)+rfl(0)] 



(12.7) 



Performing the above calculation, the resulted error is only 1.26 x 10 
verifies the accuracy of the decomposition. 
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Fig. 12.3 Separation and reconstruction of two sine waves 
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The second example involves an intermittent linear chirp signal as expressed 
in (12.7): 



x{t) 




. ,t+20 
111 1 1 



?e[l,4]U[6,9] 
else 



(12.8) 



The signal is decomposed using the SGWT, and the results are shown in 
Fig. 12.4. 

Adopting the same concept as that in the first example, the error of this transform 
is calculated as 4.09 x 10~'^, which again verifies the accuracy of the decom- 
position result using the SGWT. 

In the area of manufacturing, surface topography has been considered as one 
of the factors that affect the functional performance of components. The features of 
a surface, such as the roughness and waviness, have direct impact on the wear 
rate of the component. To identify these surface features, the SGWT has been used 
for surface analysis (Jiang et al. 2001a, b, 2008). As an example, the bearing surface 
of a worn metallic femoral head is shown in Fig. 12.5a, where two different types 
of scratches (a regular scratch that is related to manufacturing process and a random 
scratch that is generated during the service time) exist (Jiang et al. 2001b). With 
the application of the SGWT to processing the bearing surface, the waviness feature 
can be clearly seen in Fig. 12.5b. 



Original Signai Sample 



0.5 
0- 
-0.5 



1 ^ 
0.5 

-0.5 
-1 _ 




'im 



12 3 4 5 6 7 
Time 






Original Signal Sample 


0.025 




i' rll 


0.02 

0.015 

0.01 


^W\ 


\\ : 


0.005 


— „.v , ,' 


J \ 




9 10 5 1 1 5 20 25 30 35 40 45 50 

Frequency 



■ll 




. ,.„„lt<lf 


\ 

\ At 



10 5 10 15 20 25 30 35 40 45 50 

Frequency 



0.03 
0.025 

0.02 
0.015 

0.01 
0.005 




10 5 10 15 20 25 30 35 40 45 50 

Frequency 
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Fig. 12.5 Bearing surface of a new metallic femoral head (Jiang et al. 2001b). (a) The measured 
surface and (b) SGWT processed surface 



12.2 Ridgelet Transform 

Classical wavelet transform has been found insufficient in addressing certain 
problems in image processing. Prominent among the limitations is the fact 
that wavelets are essentially isotropic (i.e., its characteristics is uniform in all 
directions) in nature and are therefore inadequate for analyzing anisotropic features 
in images (Starck et al. 2006). Such constraint on the applicability of wavelets to 
image processing has led to research in improved methods of representation and 
analysis. One such method is called ridgelet analysis, which was developed 
by researchers at the Stanford University in 1998 (Candes 1998; Candes and 
Donoho 1999). The analysis is based on ridge functions that were known since 
the late 1970s (Logan and Shepp 1975). 



12.2.1 Theoretical Basis of Ridgelet Transform 



Ridgelets and the associated ridgelet analysis present a multiscale representation 
of mathematical functions through the superposition of ridge functions. The ridge 
functions are expressed as r{aixi + a2X2 + • • • + ai,x„) (Candes and Donoho 1999). 
They are a set of functions with n variables, and are constant along the hyperplanes 
fliA'i + a2X2 + • • • + anX„ = c. A graphical representation of the ridgelet functions 
is given in Fig. 12.6. 
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Fig. 12.6 A ridgelet with ridge functions marked by lines parallel to 3'-axis (Starck et al. 2003) 



Furthermore, a ridge function can also be expressed as a multivariate function 

(f : U"-^ R) of a set of real numbers as 



f{xi,X2, . . . ,A-„) = ,?(fliA-i H h a„x„) = g{ax) 



(12.9) 



where g:R—*R is a function of a set of real numbers, and 
a = {ai,a2,... ,a„) G R is a vector representing the direction. The multivariate 
function /with n variables can be further approximately represented by a superpo- 
sition of m (m < n) ridge functions (Candes 1998; Candes and Donoho 1999; Starck 
et al. 2006) as 



f{x\_,X2, . . . ,x„) « ^ c,cr(a,iA-i + aaxi 



Qif]Xijj 



(12.10) 



where c, denotes the coefficients and m denotes the number of ridge functions. 

Ridgelet transform is associated with the ridge functions, and the concept of 
ridgelet transform is similar to that of the Fourier transform in that it is associated 
with the periodic sine and cosine functions, as mathematically expressed below. 

Consider a smooth, univariate function ip, such that ij/ : U. ^ U., with a vanishing 
mean, J il/{t) dt = 0. Given this function, we can further define a bivariate function 



^a,hM 



as (Candes and Donoho 1999): 



^a.h,e{x) 



-1/2. 



x\ cos + X2 sin — fo 



(12.11) 
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In(12.11), A" = (ai,X2) € U , fl>0 is the dilation parameter, /? g IR represents the 
translation parameter, and 9 e [0,2n) represents the direction parameter. 

Equation (12.11) represents a ridgelet, whereas the dilation and translation 
parameters given above perform the function of scaling and translating the 
ridgelet, similar to the dilation (by the scaling factor s) and translation (by the 
time constant i) operations in a wavelet. The function i/^^, ,, g(x) is constant along 
the lines (i.e., ridges) xi cos + X2 sin = constant. Transverse to these ridges, it 
is a wavelet function. Accordingly, the continuous ridgelet transform of any 
integrable bivariate function can be expressed as (Candes 1998; Candes and 
Donoho 1999; Starck et al. 2006). 



7?f(«,/',9) = yiA.,,,fl(-v)/(x) 



djt- (12.12) 



i>2 



It is interesting to note that i/'^,^ ^ is defined on the IR space and the associated 
transform is therefore 2D. The inverse of (12.12) used in reconstruction is given as 
(Candes and Donoho 1999) 



/W=/ / / — ^f(«,^0)>A«.fe,eWd«d/7d0 (12.13) 

Jo J-oc JO " 



12.2.2 Application of the Ridgelet Transform 

The 2D nature of the ridgelet transform makes it very well suited for analyzing and 
processing images. Prominent ridgelet applications include denoising, edge detec- 
tion, and classification of tissues from images of internal human organs. As an 
example. Fig. 12.7 shows two images of a supernova before and after denoising 
using ridgelets, respectively (Starck et al. 2003). 

It can be seen that the original x-ray image (the left side of Fig. 12.7) is blurred 
by the noise, while the image becomes clear after the noise has been filtered out 
using the ridgelet transform (the right side of Fig. 12.7). 

Another example of the application of the ridgelet transform is to characterize 
surface topography (Ma et al. 2005). This is an important issue, as it has impacts on 
the mechanical and physical properties of the system. Figure 12.8 shows the results 
of extracting deep scratches from a honed surface, which were seen in an automo- 
tive engine cylinder (Ma et al. 2005). Their distribution of such scratches on the 
surface and their amplitudes directly affect the gas or air flow and pressure balance 
in an engine. In Fig. 12.8b, deep scratches are reconstructed by means of a ridgelet 
transform. 
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Fig. 12.7 Image denoising using the ridgelet transform. Reproduced from Starck et al. (2003) 



Fig. 12.8 Extraction of 
linear scratches from a honed 
surface of the automotive 
engine cylinder (Ma et al. 
2005). (a) Raw measurement 
surface and (b) extracted 
surface using ridgelet 
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12,3 Curvelet Transform 

Ridgelet transform is a relatively new system of representation and analysis, which 
has been shown to be effective in resolving edges (Candes and Donoho 1999; Do 
and Vetterli 2003; Dettori and Semler 2007). However, ridgelets are limited to 
resolving only straight edges; curved edges cannot be represented with as few 
coefficients as required for a straight edge (Do and Vetterli 2003; Starck et al. 
2003). The inadequacy of wavelets and ridgelets in resolving edges has been the 
primary driving force behind the search for better representation and analysis. 
Curvelet analysis, introduced in the year 2000 (Candes and Donoho 2000), holds 
potential in addressing the shortcomings of wavelets and ridgelets. A brief intro- 
duction to the fundamentals of curvelets is given in this section. 



12.3.1 Curvelet Transform 

The curvelet transform is defined as the inner product of the function / to be 
analyzed and a family of curvelets jaho (Candes and Donoho 2000, 2005a, b): 

Tf{a,b,e) = (f,yM) (12.14) 

where a>0 is the scale parameter, h is the translation parameter, and e [0, 2n) is 
the orientation parameter. The symbol Ff represents the curvelet transform. The 
family of curvelets is explained by starting with two smooth, nonnegative, real 
valued windowing functions called the radial window W{r) and the angular 
window V(t), respectively (Candes and Donoho 2005a, b). The two windowing 
functions are subject to the following two admission conditions: 

f°° 1 

/ -W{arfda=l, V ;->0 (12.15) 

Jo a 

1 
V{tfdt=l (12.16) 



In (12.5), r e (1/2, 2) is the radial coordinate and in (12.6) t e [—1, 1] denotes the 
time variable. According to the definition in (12.14), at a given scale a, a family of 
curvelets can be generated by translation and rotation of a basic element y^go ^s 
shown in (12.17) (Candes and Donoho 2005a, b): 

yM^yaOoiRei-x-h)) (12.17) 

where Re is a 2 x 2 rotation matrix that is related to planar rotation by an angle 9. 
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The basic element itself is expressed mathematically as: 

y«0()('-, oj) = W{ar) v(-^ a^' (12.18) 

where r and (o are polar coordinates defined in the frequency domain. 

Generally, the discrete curvelet transform is often used to process the function/, 
which also starts with two window functions: the radial window W(r) and the 
angular window V{t) (Candes et al. 2006). The transform is subject to the condi- 
tions expressed in (12.19)-(12.21) as: 

oc 

Y^ W{2Jrf = 1 (12.19) 

oc 

Y^ V{t-lf ^ 1 (12.20) 

/— — OO 

Uj{r, 9) = 2-'j/'W{2-Jr)v(-^—\ (12.21) 

In (12.21), the window function f/, is derived from the radial window W(r) and the 
angular window V(t) and expressed in the Fourier domain. The symbols 
;■ e (3/4,3/2) and 9 denote polar coordinates, and t G (—1/2, 1/2) is the time 
variable. 

Based on these notations, a family of curvelets at a fixed scale of 2' is defined as: 

^,,,,,(A-) = ^,(^e,(A--xf')) (12.22) 

where x^ — /?g'(^i2"-', k22^^l^) represents the position information, and 

(cos 9 sin \ 
represents the rotation information in terms of 9 radians, 
—sin 9 cos 9 I 

respectively. 

Accordingly, the inner product between a curvelet ipj , ^. and a function/ re suits in 
the curvelet coefficients as: 



c(/-, /, k) = (/■, ipjj^,) = / fix) ipjj^^ix) dx (12.23) 

where c(J, I, k) are the curvelet coefficients. 

The physical interpretation of (12.23) in the Fourier domain at a scale of 2' can 
be illustrated in Fig. 12.9. The concentric circles represent the family of curvelets 
iPjtic{x) and the shaded portion represents one curvelet from this family. 
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Fig. 12.9 A family of 
curvelets in polar coordinates, 
with the shaded area 
representing support for a 
single curvelet (Candes et al. 
2006) 




While the curvelets described above are expressed in the polar coordinates, 
Cartesian coordinates are desirable to implement the curvelet transfomi (Donoho 
and Duncan 2000; Candes et al. 2006). Consequently, the window functions 
expressed in (12.19)-(12.21) are expressed in the Cartesian coordinates as 






Vj{co) = V 



0)1 



Uj{(o) = Wj{(o) Vj{oj) 



(12.24) 



(12.25) 



(12.26) 



andO<o!»< 1, (/) = 



where 0(ffli,aj2) = (/)(2-^ffi.i)(/)(2->ffl2),2^' <coi< 2i+\~2-il^ < (cuz/ffli) < 2--'/^ 

'1, [-1/2,1/2] 
0, [-2,2] 

Currently, there are two prevalent methods of calculating the curvelet coeffi- 
cients. The first method, called the digital curvelet transform via unequispaced fast 
Fourier transform (Candes et al. 2006), involves the following four steps: 

1. Calculate the 2D FT of the function of interest (f[tj, t2\), to obtain its Fourier 

samples, /[«!, 772]- 

2. Resample /[«i,n2] for each scale/angle pair (/, /) to obtain sampled values 

/[«i, «2-nitan0/]. 

3. Multiply the resampled / with the window function t/,, to obtain 
.//,/[«!, "2] =/[«ij n2 — «i tan0/] Uj[n\,n2\, where tan0/ — 12^^!'^^. 

4. Calculate the inverse of the 2D FT of every fjj to obtain the discrete curvelet 
coefficients, c(/, /, k). 
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The second method is called digital curvelet transform via wrapping (Candes 
et al. 2006), and involves the following computational steps: 

1. Calculation of 2D FT of the function of interest (f[tj, 12]), to obtain its Fourier 

samples, /[«i,«2]. 

2. Calculation of the product Ujj[n\, «2] /[«! , «2]- 

3. Wrapping the product f///[ni,«2] /[«i,«2] around the origin to obtain 

4. Calculation of inverse 2D FFT of every fjj to obtain discrete curvelet 
coefficients, c(/, /, k). 



12.3.2 Application of the Curvelet Transform 

Because of its multiscale nature, most of the curvelet applications are related 
to image processing. Specifically, the curvelet transform has been applied to 
image compression, contrast enhancement, feature extraction from noisy images, 
pattern detection, noise filtration, edge detection, etc. As an example, a 
denoising operation on a test image "Lena" is shown in Fig. 12.10. This image, 
which is 512 x 512 pixels in size (Fig. 12.10a), was contaminated with random 
noise (peak signal-to-noise ratio, PSNR: 22.1), as shown in Fig. 12.10b. Curvelet 
transform is then applied for image denoising via thresholding of its curvelet 
coefficients. The result is a filtered image as seen in Fig. 12.10c, which has an 
improved PSNR value of 3 1 . 1 . 

In manufacturing, the curvelet has been used for surface characterization. 
Fig 12.11a shows a surface of a worn metallic femoral head. When additive white 
noise is added into it, the surface has a signal-to-noise ratio (SNR) of 56.43 dB. 
When the wavelet transform is used to remove the noise, the SNR of the surface has 
increased to 58.72 dB. Finally, the performance of the denoising operation was 
further improved when the curvelet transform is applied to process the surface, 
where the SNR has increased to 61.62 dB. 
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12.10 (a) Original image, (b) image contaminated with noise, and (c) image after denoising 
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Fig. 12.11 Denoising of a microscalar surface (Ma 2007). (a) Original surface (scale unit: urn), 
(b) noisy surface (signal-to-noise ratio, SNR = 56.43 dB), (c) denoised surface using wavelet 
(SNR = 58.72 dB), and (d) denoised surface using curvelet (SNR = 61.62 dB) 



12.4 Summary 



This chapter briefly presents development in signal processing that goes 
beyond the classical wavelet transform. The SGWT based on the lifting 
scheme is first introduced to allow for the design of the base wavelet functions 
to better fit the signal to be analyzed. The ridgelet and curvelet transforms 
are then introduced from the point of view of overcoming the limitations 
in detecting edges (straight and curved) of images when applying the classical 
wavelet transform. These techniques, together with further advancement 
reported in the literature, such as multiwavelet transform (Cotronei et al. 1998), 
dual-tree wavelet transform (Selesnick et al. 2005), and contourlet transform 
(Do and Vetterli 2005), promise to continually push the envelope of signal 
and image processing to better serve the needs for a wide range of engineering 
problems. 
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